The Transformer is a deep learning architecture that relies on the self-attention mechanism to process input data in parallel. It revolutionized NLP by enabling the training of much larger and more powerful models than previous RNNs.
Introduced by Google in the 2017 paper 'Attention Is All You Need'.
The underlying architecture for virtually all modern state-of-the-art language models (GPT, BERT, Llama).