QwQ-32B: Scaling RL for Enhanced Reasoning in LLMs
2025-03-05
Researchers have achieved a breakthrough in scaling reinforcement learning (RL) for large language models (LLMs). Their 32-billion parameter QwQ-32B model demonstrates performance comparable to the 671-billion parameter DeepSeek-R1 (with 37 billion activated parameters), highlighting the effectiveness of RL applied to robust foundation models. QwQ-32B, open-sourced on Hugging Face and ModelScope under the Apache 2.0 license, excels in math reasoning, coding, and general problem-solving. Future work focuses on integrating agents with RL for long-horizon reasoning, pushing the boundaries towards Artificial General Intelligence (AGI).
AI