ChemBench: A Benchmark for LLMs in Chemistry

2025-06-16
ChemBench: A Benchmark for LLMs in Chemistry

ChemBench is a new benchmark dataset designed to evaluate the performance of large language models (LLMs) in chemistry. It features a diverse range of chemistry questions spanning various subfields, categorized by difficulty. Results show leading LLMs outperforming human experts overall, but limitations remain in knowledge-intensive questions and chemical reasoning. ChemBench aims to advance chemical LLMs and provide tools for more robust model evaluation.