Google's Novel Approach to Preventing Outages with STPA
2025-03-20
Google leverages System Theoretic Process Analysis (STPA) to prevent outages in its massive software systems. This article details Google's journey from initial external training to developing custom internal training, including building control structure models and ultimately creating self-paced tutorials to more effectively disseminate STPA. By using real Google system examples, the training became more engaging and highlighted the importance of focusing on feedback paths, often neglected in traditional software design. Google's experience shows STPA effectively identifies potential failure points, improving system reliability.
Development
system reliability