MathArena: Rigorously Evaluating LLMs on Math Competitions

2025-04-02

MathArena is a platform for evaluating large language models (LLMs) on recent math competitions and olympiads. It ensures fair and unbiased evaluation by testing models exclusively on post-release competitions, preventing retroactive assessments on potentially leaked data. The platform publishes leaderboards for each competition, showing individual problem scores for different models, and a main table summarizing performance across all competitions. Each model runs four times per problem, averaging the score and calculating the cost (in USD). The evaluation code is open-sourced: https://github.com/eth-sri/matharena.

Read more