The maximum amount of text (in tokens) an LLM can consider at one time. If exceeded, the model 'forgets' the start of the conversation.
Defined by model architecture memory limits.
Moving from 4k to 1M+ (Gemini/Llama).