SLOs, SLIs & SLAs
Framework for defining, measuring, and communicating service reliability through service level objectives (SLOs), indicators (SLIs), and agreements (SLAs).
What it is
SLOs, SLIs, and SLAs are a framework for defining and measuring service reliability:
- SLI (Service Level Indicator): metric measuring a service aspect (e.g., p99 latency)
- SLO (Service Level Objective): internal target for the SLI (e.g., p99 < 200ms)
- SLA (Service Level Agreement): contractual commitment with consequences (e.g., 99.9% uptime or credits)
Relationship
SLI (what we measure) → SLO (what we want) → SLA (what we promise)
The SLO should always be stricter than the SLA to have margin.
Common SLIs
| SLI | Measurement |
|---|---|
| Availability | % of successful requests |
| Latency | Response time percentile |
| Throughput | Requests per second |
| Error rate | % of requests with errors |
| Freshness | Data age |
Error Budget
Error budget = 100% - SLO. If SLO = 99.9%, you have 0.1% margin (~43 min/month). This budget is "spent" on deploys, experiments, and failures.
Why it matters
SLOs turn reliability into a quantifiable engineering decision. Without them, teams don't know how much reliability is enough and oscillate between over-investing in stability or ignoring operational debt until an incident forces them to act.
References
- SRE Book - Service Level Objectives — Google.
- SLA vs SLO vs SLI — Atlassian, 2024. Practical comparison between SLA, SLO, and SLI.
- Implementing SLOs — SRE Workbook — Google, 2018. Practical guide for implementing SLOs.
Related content
- Site Reliability Engineering
Discipline applying software engineering principles to infrastructure operations, focusing on creating scalable and highly reliable systems.
- Maturity Models
Structured frameworks for progressively assessing and improving organizational capabilities, from CMMI to modern approaches like DORA and simplified models.
- Alerting Strategies
Practices for configuring effective alerts that notify real problems without generating fatigue from excessive notifications.