LLM-powered AI Agents Fail to Meet Expectations in CRM Tests

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

LLM-powered AI Agents Fail to Meet Expectations in CRM Tests

2025-06-16

A new benchmark reveals that Large Language Model (LLM)-based AI agents underperform on standard CRM tests, particularly regarding confidentiality. Salesforce research shows a 58% success rate for single-step tasks, plummeting to 35% for multi-step tasks. Critically, these agents demonstrate poor awareness of confidential information, negatively impacting performance. The study highlights limitations in existing benchmarks and reveals a significant gap between current LLM capabilities and real-world enterprise needs, raising concerns for developers and businesses relying on AI agents for efficiency gains.

(www.theregister.com)

dk coder: Empowering Non-Programmers to Build Secure Apps

From Prison Cell to Turso Software Engineer: A Hacker News Story