ChemBench: A Benchmark for LLMs in Chemistry
2025-06-16

ChemBench is a new benchmark dataset designed to evaluate the performance of large language models (LLMs) in chemistry. It features a diverse range of chemistry questions spanning various subfields, categorized by difficulty. Results show leading LLMs outperforming human experts overall, but limitations remain in knowledge-intensive questions and chemical reasoning. ChemBench aims to advance chemical LLMs and provide tools for more robust model evaluation.
AI
Chemistry