Transformer Architecture

What is Transformer Architecture?

The Transformer is a deep learning architecture that relies on the self-attention mechanism to process input data in parallel. It revolutionized NLP by enabling the training of much larger and more powerful models than previous RNNs.

Where did the term "Transformer Architecture" come from?

Introduced by Google in the 2017 paper 'Attention Is All You Need'.

How is "Transformer Architecture" used today?

The underlying architecture for virtually all modern state-of-the-art language models (GPT, BERT, Llama).

Related Terms