Towards System 2 Reasoning in LLMs: Meta Chain-of-Thought

Researchers propose Meta Chain-of-Thought (Meta-CoT), a novel framework extending traditional Chain-of-Thought (CoT) by explicitly modeling the reasoning behind a given CoT. Meta-CoT leverages process supervision, synthetic data generation, and search algorithms. The paper outlines a training pipeline incorporating instruction tuning with linearized search traces and reinforcement learning. This work provides a roadmap for enabling Meta-CoT in LLMs, promising more powerful and human-like reasoning in AI.
Read more