SentencePiece

What is SentencePiece?

SentencePiece is an unsupervised text tokenizer and detokenizer mainly used in Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training.

Where did the term "SentencePiece" come from?

Developed by Google for use in models like BERT and T5.

How is "SentencePiece" used today?

Widely used in open-source models (like Llama 2) for its language-agnostic handling of text.

Related Terms