LLM Showdown: A Real-World Evaluation of 130 Prompts

2025-08-24

The author conducted a real-world evaluation of over a dozen LLMs across four categories: programming, sysadmin tasks, technical explanations, and creative prompts, using 130 prompts from their bash history. Open-source models consistently outperformed closed-source options like Gemini 2.5 Pro in accuracy, speed, and cost-effectiveness. The author concluded by using a combination of fast, cheap open-source models, supplemented by more powerful closed-source models as needed.

Read more
AI