ALiBi (Attention with Linear Biases)

What is ALiBi (Attention with Linear Biases)?

An embedding-free positioning method that biases attention scores based on distance. It allows models to extrapolate to sequences longer than they saw during training.

Where did the term "ALiBi (Attention with Linear Biases)" come from?

Alternative to RoPE for long contexts.

How is "ALiBi (Attention with Linear Biases)" used today?

Used in models prioritizing extreme length.

Related Terms