OpenAI's New Models Hallucinate More: Bigger Isn't Always Better

2025-04-18
OpenAI's New Models Hallucinate More: Bigger Isn't Always Better

OpenAI's recently released o3 and o4-mini models, while state-of-the-art in many ways, exhibit a troubling increase in hallucinations compared to their predecessors. Internal tests reveal significantly higher hallucination rates than previous reasoning models (o1, o1-mini, o3-mini) and even traditional non-reasoning models like GPT-4o. OpenAI is unsure of the cause, posing a challenge for industries demanding accuracy. Third-party testing confirms this issue, with o3 fabricating steps in its reasoning process. While excelling in coding and math, the higher hallucination rate limits applicability. Addressing model hallucinations is a key area of AI research, with granting models web search capabilities emerging as a promising approach.