Qwen2.5-VL-32B: A 32B Parameter Visual-Language Model That's More Human-Friendly
2025-03-24
Following the widespread acclaim of the Qwen2.5-VL series, we've open-sourced the new 32-billion parameter visual-language model, Qwen2.5-VL-32B-Instruct. This model boasts significant improvements in mathematical reasoning, fine-grained image understanding, and alignment with human preferences. Benchmarking reveals its superiority over comparable models in multimodal tasks (like MMMU, MMMU-Pro, and MathVista), even outperforming the larger 72-billion parameter Qwen2-VL-72B-Instruct. It also achieves top-tier performance in pure text capabilities at its scale.