LLaDA: A Novel Large Language Model Paradigm Based on Diffusion Models

2025-02-20
LLaDA: A Novel Large Language Model Paradigm Based on Diffusion Models

LLaDA (Large Language Diffusion with mAsking) is a novel large language model paradigm based on masked diffusion models, challenging the prevailing view that existing LLMs rely on autoregressive mechanisms. LLaDA approximates the true language distribution through maximum likelihood estimation; its remarkable capabilities stem not from the autoregressive mechanism itself, but from the core principle of generative modeling. Research shows LLaDA exhibits competitive scalability compared to autoregressive baselines on the same data, with pre-training and supervised fine-tuning using masked diffusion and text generation via diffusion sampling.