Humanity's Last Exam: A Groundbreaking AI Benchmark
2025-01-23

Scale AI and the Center for AI Safety (CAIS) released the results of "Humanity's Last Exam," a new benchmark designed to push the limits of AI knowledge. The exam, featuring over 3,000 expert-level questions across various fields, revealed that even the most advanced AI models (like GPT-4, Claude 3.5, and Gemini 1.5) correctly answered less than 10% of the questions. This highlights significant gaps in complex reasoning capabilities. The benchmark tackles 'benchmark saturation,' where models excel on existing tests but struggle with novel problems. The dataset will be publicly released to further AI research and development, providing valuable insights into the current state and future direction of AI.
AI
AI Benchmark