LLaDA: A Novel Large Language Model Paradigm Based on Diffusion Models
2025-02-20

LLaDA (Large Language Diffusion with mAsking) is a novel large language model paradigm based on masked diffusion models, challenging the prevailing view that existing LLMs rely on autoregressive mechanisms. LLaDA approximates the true language distribution through maximum likelihood estimation; its remarkable capabilities stem not from the autoregressive mechanism itself, but from the core principle of generative modeling. Research shows LLaDA exhibits competitive scalability compared to autoregressive baselines on the same data, with pre-training and supervised fine-tuning using masked diffusion and text generation via diffusion sampling.