An embedding-free positioning method that biases attention scores based on distance. It allows models to extrapolate to sequences longer than they saw during training.
Alternative to RoPE for long contexts.
Used in models prioritizing extreme length.