Azure's LLM Models: A Case of Degrading Performance
2025-09-20

A developer building a product using Azure's LLMs and audio models discovered a concerning trend: the same models are getting progressively worse over time. Using identical system prompts and messages, the accuracy of responses from both GPT-4o-mini and GPT-5-mini/nano have declined significantly. GPT-5, while initially expected to be superior, proved slower and less accurate than the older GPT-4o-mini. The developer suspects Microsoft is deliberately degrading older models to push users towards newer, less reliable versions. This practice undermines user experience and may drive developers to seek alternative platforms.
Development
model degradation