Scaling RL: Next-Token Prediction on the Web

2025-07-13
Scaling RL: Next-Token Prediction on the Web

The author argues that reinforcement learning (RL) is the next frontier for training AI models. Current approaches of scaling many environments simultaneously are messy. Instead, the author proposes training models to reason by using RL for next-token prediction on web-scale data. This leverages the vast amount of readily available web data, moving beyond the limitations of current RL training datasets focused on math and code problems. By unifying RL with next-token prediction, the approach promises to create significantly more powerful reasoning models.

AI