LLM Benchmark: Pelican on a Bicycle
2024-12-16
Simon Willison created a unique LLM benchmark: generating an SVG image of a pelican riding a bicycle. This unusual prompt aimed to test the models' creative abilities without relying on pre-existing training data. He tested 16 models from OpenAI, Anthropic, Google Gemini, and Meta (Llama on Cerebras), revealing significant variations in the quality of generated SVGs. Some models produced surprisingly good results, while others struggled.