Webtagr - Technology News Summarizer

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

DeepSeek-V3：6710亿参数的开源多专家混合语言模型

2024-12-26

DeepSeek-V3是一个拥有6710亿参数的强大多专家混合(MoE)语言模型，每次token激活370亿参数。它采用多头潜在注意力(MLA)和DeepSeekMoE架构，并创新性地使用了无辅助损失的负载平衡策略和多token预测训练目标，在14.8万亿高质量token上进行预训练，随后进行监督微调和强化学习。评测结果显示，DeepSeek-V3超越其他开源模型，性能与领先的闭源模型相当，且训练效率极高，仅需2.788M H800 GPU小时。

(github.com)

AI MoE