Top-k sampling limits the model's choices to the 'k' most probable next tokens, cutting off the long tail of low-probability words to improve coherence.
Where did the term "Top-k Sampling" come from?
Standard technique to prevent gibberish generation.