Top-p (Nucleus) Sampling

What is Top-p (Nucleus) Sampling?

A decoding strategy for text generation where the model samples from the smallest set of top tokens whose cumulative probability exceeds a threshold 'p' (e.g., 0.9). Unlike Top-k, which picks a fixed number of words, Top-p dynamically adjusts: it considers fewer options when the model is confident and more when it is uncertain. This results in generated text that is diverse but stays coherent.

Where did the term "Top-p (Nucleus) Sampling" come from?

Introduced by Holtzman et al. in the paper 'The Curious Case of Neural Text Degeneration' (2019).

How is "Top-p (Nucleus) Sampling" used today?

The industry standard for open-ended text generation (e.g., ChatGPT, Claude, Llama).

Related Terms

Top-k Sampling
Temperature
Inference