LLM Elimination Game: Social Reasoning, Strategy, and Deception
2025-04-07
Researchers created a multiplayer "elimination game" benchmark to evaluate Large Language Models (LLMs) in social reasoning, strategy, and deception. Eight LLMs compete, engaging in public and private conversations, forming alliances, and voting to eliminate opponents until only two remain. A jury of eliminated players then decides the winner. Analyzing conversation logs, voting patterns, and rankings reveals how LLMs balance shared knowledge with hidden intentions, forging alliances or betraying them strategically. The benchmark goes beyond simple dialogue, forcing models to navigate public vs. private dynamics, strategic voting, and jury persuasion. GPT-4.5 Preview emerged as the top performer.