A decoding strategy for text generation where the model samples from the smallest set of top tokens whose cumulative probability exceeds a threshold 'p' (e.g., 0.9). Unlike Top-k, which picks a fixed number of words, Top-p dynamically adjusts: it considers fewer options when the model is confident and more when it is uncertain. This results in generated text that is diverse but stays coherent.
Introduced by Holtzman et al. in the paper 'The Curious Case of Neural Text Degeneration' (2019).
The industry standard for open-ended text generation (e.g., ChatGPT, Claude, Llama).