ALiBi (Attention with Linear Biases)

What is ALiBi (Attention with Linear Biases)?

An efficient method for handling position in Transformer models without using learned positional embeddings. ALiBi adds a static, non-learned bias to the attention scores that penalizes the interaction between tokens based on their distance: the further apart two tokens are, the less they can attend to each other. This simple inductive bias allows models trained on short sequences to extrapolate effectively to much longer sequences at inference time.

Where did the term "ALiBi (Attention with Linear Biases)" come from?

Introduced in the paper 'Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation' (2021) by Press et al.

How is "ALiBi (Attention with Linear Biases)" used today?

Used in models like MPT (MosaicML Pretrained Transformer) and BLOOM to enable extremely long context windows.

Related Terms