SentencePiece is an unsupervised text tokenizer and detokenizer mainly used in Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training.
Developed by Google for use in models like BERT and T5.
Widely used in open-source models (like Llama 2) for its language-agnostic handling of text.