Vision-Language Models: Blindly Confident, Dangerously Wrong
2025-06-03
State-of-the-art Vision-Language Models (VLMs) boast 100% accuracy on standard images (e.g., counting stripes on an Adidas logo). However, a new study reveals their catastrophic failure on subtly altered images – accuracy plummets to ~17%. Instead of visual analysis, VLMs rely on memorized knowledge, exhibiting severe confirmation bias. This flaw poses significant risks in high-stakes applications like medical imaging and autonomous vehicles. The research highlights the urgent need for more robust models and evaluation methods that prioritize genuine visual reasoning over pattern matching.
Read more
AI