The maximum amount of text (measured in tokens) that an LLM can process in a single pass. This includes both the input prompt and the generated response. If the conversation exceeds this window, the model 'forgets' the earliest parts.
Determined by the model's architecture and memory constraints during training.
A key competitive metric, expanding rapidly from 4k tokens (GPT-4 launch) to 1M+ tokens (Gemini 1.5, Llama 3).