Flawed AI Forecasting Chart Goes Viral: A Cautionary Tale

METR, a non-profit research lab, released a report charting the rapid progress of large language models in software tasks, sparking viral discussions. However, the chart's premise is flawed: it uses human solution time to measure problem difficulty and AI's 50% success rate time as a measure of capability. This ignores the diverse complexities of problems, leading to arbitrary results unsuitable for prediction. While METR's dataset and discussions on current AI limitations are valuable, using the chart for future AI capability predictions is misleading. Its viral spread highlights a tendency to believe what one wants to believe rather than focusing on validity.
Read more