DeepSeek-R1: A Reasoning Model Trained via Reinforcement Learning and its Distilled Versions
2025-01-20

DeepSeek has released its first-generation reasoning models, DeepSeek-R1. Trained via large-scale reinforcement learning without supervised fine-tuning, DeepSeek-R1 addresses issues like endless repetition and poor readability present in its predecessor, DeepSeek-R1-Zero, by incorporating cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across various benchmarks. Furthermore, DeepSeek has open-sourced DeepSeek-R1 and six distilled models based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B surpasses OpenAI-o1-mini on multiple benchmarks, setting new state-of-the-art results for distilled models. These models, along with a user-friendly API and chat interface, are available on Hugging Face.