LLM Randomness Test Reveals Unexpected Bias

2025-04-30

This experiment tested the randomness of several Large Language Models (LLMs) from OpenAI and Anthropic. By having the models toss a coin and predict random numbers between 0 and 10, researchers discovered a significant bias in their outputs, revealing they aren't truly random. For instance, in the coin toss experiment, all models showed a preference for 'heads,' with GPT-o1 exhibiting the most extreme bias at 49%. In the odd/even number prediction, most models favored odd numbers, with Claude 3.7 Sonnet displaying the strongest bias at 47%. The findings highlight that even advanced LLMs can exhibit unexpected patterns influenced by their training data distributions.