Training Long-Horizon Terminal Agents with Reinforcement Learning: Terminal-Bench-RL

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

Training Long-Horizon Terminal Agents with Reinforcement Learning: Terminal-Bench-RL

2025-07-29

This project details the creation of a stable RL training infrastructure scaling to 32x H100 GPUs across 4 nodes for training long-horizon terminal-based coding agents. The author developed Terminal-Agent-Qwen3-32b, achieving the highest score on terminal-bench for Qwen3 agents *without* training! Built upon the rLLM framework, it includes custom environments and infrastructure. Using ~$1M in compute, the agent achieved 19th place on the terminal-bench leaderboard, outperforming several top agents from Stanford and OpenAI. A sophisticated system prompt and custom tools guide the agent's behavior. While a full training run was cost-prohibitive, the code and dataset are provided, inviting further research with increased compute resources.

(github.com)

Development Terminal Agent

Wikipedia Fights Back Against UK's Online Safety Act

Hologram v0.5.0: Major Performance Boost and New Features