Conquering Nondeterminism in LLM Inference

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

Conquering Nondeterminism in LLM Inference

2025-09-11

The irreproducibility of large language model (LLM) inference results is a persistent problem. This post delves into the root cause, revealing it's not simply floating-point non-associativity and concurrent execution, but rather the lack of "batch invariance" in kernel implementations. Even if individual kernels are deterministic, nondeterministic variations in batch size (due to server load) affect the final output. The authors analyze the challenges of achieving batch invariance in RMSNorm, matrix multiplication, and attention mechanisms, proposing a method to eliminate nondeterminism by modifying kernel implementations. This leads to fully reproducible LLM inference and positive impacts on reinforcement learning training.

(thinkingmachines.ai)

Lightweight DataFrame in MicroHs: A Haskell 2010 Adventure

Intel's Mount Morgan IPU: A Beast of a Cloud Infrastructure Processor