DeepSeek-VL2: Advanced Multimodal Understanding with Mixture-of-Experts

2025-01-01

DeepSeek-VL2 is an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models significantly improving upon its predecessor. It excels in various tasks including visual question answering, optical character recognition, and document/table/chart understanding. The series comprises three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2, with 1.0B, 2.8B, and 4.5B activated parameters, respectively. DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source models. The project is open-sourced, offering model downloads, quick start guides, and demo examples.