Foundation Models for Time Series Forecasting: A Real-World Benchmark

Traditional time-series forecasting methods like ARIMA and Prophet are being challenged by a new generation of "foundation models." These models aim to bring the power of large language models (LLMs) to time-series data, enabling a single model to forecast across diverse datasets and domains. This article benchmarks several foundation models—Amazon Chronos, Google TimesFM, IBM Tiny Time-Mixers, and Datadog Toto—against classical baselines. Testing on real-world Kubernetes pod metrics reveals that foundation models excel at multivariate forecasting, with Datadog Toto performing particularly well. However, challenges remain in handling outliers and novel patterns, and classical models retain competitiveness for steady-state workloads. Ultimately, the authors conclude that foundation models offer significant advantages for fast-changing, multivariate data streams, providing more flexible and scalable solutions for modern observability and platform engineering teams.