3 articles tagged #reliability.
Discipline of experimenting on production systems to discover weaknesses before they cause incidents, by injecting controlled failures.
Discipline applying software engineering principles to infrastructure operations, focusing on creating scalable and highly reliable systems.
Framework for defining, measuring, and communicating service reliability through service level objectives (SLOs), indicators (SLIs), and agreements (SLAs).