The Ever-Growing Size of Large Language Models

This article traces the evolution of large language model (LLM) size. From GPT-2's 1.61B parameters to Llama-4's 2T parameters, model size has grown exponentially. The article details the parameter counts, training data sizes, and architectural features of key models, including dense models and Mixture-of-Experts (MoE) models. The emergence of MoE architectures has made it possible to train and use larger models. However, the growth in model size has also brought new challenges, such as data bias and model interpretability. The article concludes by exploring the future directions of LLM development and calls for more research to focus on the development of pure text continuation engines, rather than simply pursuing high scores on benchmark tests.