Positional encodings are mechanisms to inject information about the relative or absolute position of tokens in the sequence, as the self-attention mechanism is invariant to order.
Fundamental part of the original Transformer architecture.
Evolved into variants like RoPE (Rotary Positional Embeddings) and ALiBi which allow for better length generalization.