Vision-Language Models: Blindly Confident, Dangerously Wrong
2025-06-03
State-of-the-art Vision-Language Models (VLMs) boast 100% accuracy on standard images (e.g., counting stripes on an Adidas logo). However, a new study reveals their catastrophic failure on subtly altered images – accuracy plummets to ~17%. Instead of visual analysis, VLMs rely on memorized knowledge, exhibiting severe confirmation bias. This flaw poses significant risks in high-stakes applications like medical imaging and autonomous vehicles. The research highlights the urgent need for more robust models and evaluation methods that prioritize genuine visual reasoning over pattern matching.
AI