s1: Simple Test-Time Scaling for Strong Reasoning
2025-02-03
This paper introduces s1, a simple test-time scaling method that achieves strong reasoning performance matching o1-preview using only 1,000 examples and budget forcing. The method significantly improves performance on large language models through clever test-time strategies. The code and data are open-sourced for reproducibility and further exploration.