CLS Token (Classification Token)

What is CLS Token (Classification Token)?

A special, learnable vector added to the beginning of an input sequence in Transformer models. The final hidden state corresponding to this token is used as the aggregate representation of the entire sequence (e.g., for sentence classification) because it can attend to all other tokens.

Where did the term "CLS Token (Classification Token)" come from?

Introduced with BERT (2018) and later adopted by Vision Transformers (ViT).

How is "CLS Token (Classification Token)" used today?

A standard architectural pattern in encoder-only models for tasks like sentiment analysis or image classification.

Related Terms