Gandi's Major Outage: A Cascade of Failures Triggered by SSD Storage

2025-05-05

On March 9th, 2025, Gandi experienced a significant service disruption caused by an SSD storage filer failure, impacting numerous services including email. The outage lasted for hours, with some mailboxes remaining inaccessible until the following day. While no data was lost, the incident highlighted weaknesses in Gandi's redundancy and fault tolerance, including insufficient redundancy in internal monitoring, flawed VM architecture, and insufficient capacity in some redundant systems. Gandi has implemented improvements to redundancy mechanisms, enhanced monitoring, and upgraded storage systems to prevent recurrence.