LLMs Can See and Hear Without Any Training

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

LLMs Can See and Hear Without Any Training

2025-04-26

This groundbreaking research demonstrates that Large Language Models (LLMs) can understand images and audio without any additional training. By cleverly leveraging existing LLMs, image captioning, audio captioning, and high-quality image generation techniques, researchers enabled LLMs to 'perceive' images and sounds. The project's open-source code and datasets facilitate reproducibility and further exploration.

(github.com)

OpenAI's o3 Model: A Surreal, Dystopian, and Wildly Entertaining Location Guesser

The Friendship Recession: A Cultural Crisis and How to Combat It