Webtagr - Technology News Summarizer

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

DeepSeek-R1：基于强化学习的推理模型及其蒸馏版

2025-01-20

DeepSeek团队发布了其首个推理模型DeepSeek-R1，该模型通过大规模强化学习训练，无需监督微调。为了解决DeepSeek-R1-Zero版本中存在的重复、可读性和语言混合等问题，DeepSeek-R1在强化学习前加入了冷启动数据，并在推理性能上与OpenAI-o1模型相当。此外，团队还开源了DeepSeek-R1及其六个基于Llama和Qwen的蒸馏模型，其中DeepSeek-R1-Distill-Qwen-32B在多个基准测试中超越了OpenAI-o1-mini，取得了新的SOTA结果。这些模型已在Hugging Face上公开，并提供配套的API和在线聊天平台。

(huggingface.co)

AI 模型蒸馏