LLM-powered AI Agents Fail to Meet Expectations in CRM Tests

2025-06-16
LLM-powered AI Agents Fail to Meet Expectations in CRM Tests

A new benchmark reveals that Large Language Model (LLM)-based AI agents underperform on standard CRM tests, particularly regarding confidentiality. Salesforce research shows a 58% success rate for single-step tasks, plummeting to 35% for multi-step tasks. Critically, these agents demonstrate poor awareness of confidential information, negatively impacting performance. The study highlights limitations in existing benchmarks and reveals a significant gap between current LLM capabilities and real-world enterprise needs, raising concerns for developers and businesses relying on AI agents for efficiency gains.

AI