Cultural Evolution of Cooperation Among LLM Agents

2024-12-18

Researchers investigated whether a 'society' of Large Language Model (LLM) agents can learn mutually beneficial social norms despite incentives to defect. Experiments revealed significant differences in the evolution of cooperation across base models, with Claude 3.5 Sonnet significantly outperforming Gemini 1.5 Flash and GPT-4o. Furthermore, Claude 3.5 Sonnet leveraged a costly punishment mechanism to achieve even higher scores, a feat not replicated by the other models. This study proposes a new benchmark for LLMs focused on the societal implications of LLM agent deployment, offering insights into building more robust and cooperative AI agents.