Deepseek v3: A 607B Parameter Open-Source LLM Outperforming GPT-4 at a Fraction of the Cost?
Deepseek unveiled its flagship model, v3, a 607B parameter Mixture-of-Experts model with 37B active parameters. Benchmarking shows it's competitive with, and sometimes surpasses, OpenAI's GPT-4o and Claude 3.5 Sonnet, making it the current top open-source model, outperforming Llama 3.1 403b, Qwen, and Mistral. Remarkably, Deepseek v3 achieved this performance for only ~$6 million, leveraging breakthrough engineering: MoE architecture, FP8 mixed-precision training, and a custom HAI-LLM framework. It excels in reasoning and math, even outperforming GPT-4 and Claude 3.5 Sonnet, though slightly behind in writing and coding. Its exceptional price-to-performance ratio makes it a compelling option for developers building client-facing AI applications.