DeepSeek-R1: A Reasoning Model Trained with Reinforcement Learning, No Supervised Fine-tuning Needed
The DeepSeek team open-sourced its first-generation reasoning models, DeepSeek-R1, and a suite of distilled models. DeepSeek-R1-Zero, trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT), demonstrates remarkable reasoning capabilities, though it has some flaws. DeepSeek-R1 addresses these issues by incorporating cold-start data before RL, achieving performance comparable to OpenAI-o1. Six distilled models based on Llama and Qwen are also open-sourced, with DeepSeek-R1-Distill-Qwen-32B outperforming OpenAI-o1-mini on various benchmarks. The project supports commercial use and provides an online chat website and an OpenAI-compatible API.
Read more