SWE-Bench Pro: A Challenging Benchmark for Evaluating LLMs on Software Engineering

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

SWE-Bench Pro: A Challenging Benchmark for Evaluating LLMs on Software Engineering

2025-09-22

SWE-Bench Pro is a new benchmark for evaluating large language models (LLMs) and agents on long-horizon software engineering tasks. Given a codebase and an issue, the model is tasked with generating a patch that resolves the described problem. Inspired by SWE-Bench, it uses Docker and Modal for reproducible evaluations, requiring users to set up a Docker environment and Modal credentials to run the evaluation script.

(github.com)

Development

Alibaba Unveils Qwen3-Omni: A Native End-to-End Multimodal Foundation Model

Windows 11 Native Video Wallpaper Support is Back!