Devstral: Open-Source LLM Outperforms GPT-4.1-mini on Software Engineering Benchmark

2025-05-21
Devstral: Open-Source LLM Outperforms GPT-4.1-mini on Software Engineering Benchmark

Mistral AI and All Hands AI have collaborated to release Devstral, an agentic large language model (LLM) for software engineering tasks. Devstral excels on the SWE-Bench Verified benchmark, achieving a score exceeding 46.8%, more than 6% higher than previous open-source models and even surpassing GPT-4.1-mini. It tackles complex software engineering problems, such as understanding contextual relationships within large codebases and identifying subtle bugs. Devstral is lightweight, running on a single RTX 4090 or a Mac with 32GB RAM, and supports local deployment, enterprise use, and Copilot integration. The model is open-source and available via API and various download options.

Development