DeepSeek-R1: Boosting LLM Reasoning with Reinforcement Learning
DeepSeek-AI unveils DeepSeek-R1, its first-generation reasoning model trained via large-scale reinforcement learning (RL) without supervised fine-tuning. Its precursor, DeepSeek-R1-Zero, surprisingly demonstrated strong reasoning capabilities, but suffered from readability and language mixing issues. DeepSeek-R1 addresses these flaws with multi-stage training and cold-start data, achieving performance comparable to OpenAI's models. To foster research, DeepSeek-AI open-sources DeepSeek-R1-Zero, DeepSeek-R1, and six distilled models of varying sizes, built upon Qwen and Llama.
Read more