DeepSeek-V3: A 671B-Parameter Open-Source Mixture-of-Experts Language Model

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

DeepSeek-V3: A 671B-Parameter Open-Source Mixture-of-Experts Language Model

2024-12-26

DeepSeek-V3 is a powerful 671-billion parameter Mixture-of-Experts (MoE) language model activating 37 billion parameters per token. Utilizing Multi-head Latent Attention (MLA) and the DeepSeekMoE architecture, it innovatively employs an auxiliary-loss-free load balancing strategy and a multi-token prediction training objective. Pre-trained on 14.8 trillion high-quality tokens, followed by supervised fine-tuning and reinforcement learning, DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models with remarkable training efficiency—only 2.788M H800 GPU hours.

(github.com)

Hardware-Efficient UNORM and SNORM to Float Conversion

W3C HTML Working Group: Driving HTML Standard Evolution