Model Collapse: The Risk of AI Self-Cannibalization

2025-05-17

As large language models (LLMs) become more prevalent, a risk called "model collapse" is gaining attention. Because LLMs are increasingly trained on text they themselves generate, the training data drifts away from real-world data, potentially leading to a decline in model output quality and even nonsensical results. Research shows this isn't limited to LLMs; any iteratively trained generative model faces similar risks. While data accumulation slows this degradation, it increases computational costs. Researchers are exploring data curation and model self-assessment to improve synthetic data quality, preventing collapse and addressing resulting diversity issues.