A Dummy's Guide to Modern LLM Sampling
2025-05-04

This technical article provides a comprehensive guide to sampling methods used in Large Language Model (LLM) text generation. It starts by explaining why LLMs use sub-word tokenization instead of words or letters, then delves into various sampling algorithms, including temperature sampling, penalty methods (Presence, Frequency, Repetition, DRY), Top-K, Top-P, Min-P, Top-A, XTC, Top-N-Sigma, Tail-Free Sampling, Eta Cutoff, Epsilon Cutoff, Locally Typical Sampling, Quadratic Sampling, and Mirostat. Each algorithm is explained with pseudo-code and illustrations. Finally, it discusses the order of sampling methods and their interactions, highlighting the significant impact of different ordering on the final output.