The attention mechanism allows a model to weigh the importance of different parts of the input sequence when generating each part of the output. It helps the model focus on relevant context regardless of its distance in the text.
Key innovation in the 'Attention Is All You Need' paper.
Solved the long-range dependency problem in NLP, enabling coherent long-form text generation.