Model Alloys: A Secret Weapon for Boosting AI Performance

2025-07-21
Model Alloys: A Secret Weapon for Boosting AI Performance

The XBOW team dramatically improved the performance of its vulnerability detection agents using a clever technique called "model alloys." This approach leverages the strengths of different LLMs (like Google Gemini and Anthropic Sonnet), alternating between them within a single chat thread to overcome the limitations of individual models. Experiments showed this "alloy" strategy increased success rates to over 55%, significantly outperforming individual models. This technique isn't limited to cybersecurity; it's relevant for any AI agent task requiring solutions within a vast search space.

Read more

Autonomous Penetration Tester XBOW Tops HackerOne US Leaderboard

2025-06-25
Autonomous Penetration Tester XBOW Tops HackerOne US Leaderboard

For the first time, an autonomous AI penetration tester, XBOW, has reached the top spot on the HackerOne US leaderboard. XBOW initially benchmarked itself against CTF challenges and open-source projects, uncovering and reporting numerous zero-day vulnerabilities. It then participated in HackerOne's bug bounty programs, conducting black-box testing on thousands of targets. XBOW's nearly 1060 validated vulnerability reports, including an unknown vulnerability in Palo Alto's GlobalProtect VPN, propelled it to the top ranking. This demonstrates the significant potential of AI in cybersecurity.

Read more