Evaluating LLMs' Code Generation Capabilities: Introducing MultiCodeBench

2024-12-30

AI-powered programming assistants based on code Large Language Models (LLMs) have become increasingly prevalent, significantly boosting developer productivity. However, existing code generation benchmarks primarily focus on general-purpose scenarios, leaving the performance of LLMs in specific application domains largely unknown. This paper introduces MultiCodeBench, a new benchmark comprising 2,400 programming tasks across 12 popular software development domains and 15 programming languages. Experiments on eleven mainstream LLMs reveal their code generation performance across different domains, offering practical insights for developers in selecting LLMs and guidance for model developers to enhance domain-specific code generation capabilities.

Development Code Generation