Qwen VLo: A Unified Multimodal Model That Understands and Creates Images

2025-06-28
Qwen VLo: A Unified Multimodal Model That Understands and Creates Images

Alibaba DAMO Academy introduces Qwen VLo, a new multimodal model that not only understands image content but also generates high-quality images based on that understanding. Employing a progressive generation method, it builds images gradually from left to right and top to bottom, ensuring a coherent and harmonious final result. Qwen VLo supports multilingual instructions, handles complex tasks like image editing and style transfer, and can even understand the content of its own generated images. While currently in preview, its powerful multimodal capabilities showcase the immense potential of AI in image generation.