LLMs Fail at Font Identification: A Live Benchmark

2025-08-04
LLMs Fail at Font Identification: A Live Benchmark

A developer benchmarked GPT-4 and Gemini on a live, continuously updating dataset of unidentified fonts from the DaFont forum. Despite providing context like images, titles, and descriptions, both LLMs performed abysmally. This highlights limitations in even seemingly straightforward image classification tasks, suggesting LLMs are far from a universal solution. The project uses Python scripts for data scraping, GitHub Actions for automation, JSON for storage, and Observable for a dynamic dashboard.