Top-k sampling is a text decoding strategy that limits the model's choices to the 'k' most probable next tokens. By cutting off the long tail of low-probability words, it prevents the model from choosing irrelevant or nonsensical tokens, improving the coherence of generated text.
An early and standard technique in Natural Language Processing for controlling randomness.
Widely available in almost all LLM APIs and libraries (like Hugging Face Transformers) as a basic parameter.