Scale Beats All: AI Agent Achieves SOTA on swebench-verified

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

Scale Beats All: AI Agent Achieves SOTA on swebench-verified

2025-01-08

CodeStory achieved state-of-the-art results on the swebench-verified benchmark, resolving 62.2% of issues by leveraging massive test-time inference. They used Sonnet 3.5 LLM and a simple toolset, abandoning an initial MCTS framework in favor of scaling. By running numerous agents across multiple VMs and Anthropic accounts, they demonstrated the power of scale in solving complex software engineering problems, even for small teams. This reinforces the 'bitter lesson' that scale trumps all, offering a new paradigm for AI in software engineering.

(aide.dev)

Development

UW Student Faces Expulsion for Course Swap App

The Ratchet Effect: How Engineers Build Reputation at Big Tech