Concepts

SLOs, SLIs & SLAs

Framework for defining, measuring, and communicating service reliability through service level objectives (SLOs), indicators (SLIs), and agreements (SLAs).

seed#slo#sli#sla#reliability#metrics#sre

What it is

SLOs, SLIs, and SLAs are a framework for defining and measuring service reliability:

  • SLI (Service Level Indicator): metric measuring a service aspect (e.g., p99 latency)
  • SLO (Service Level Objective): internal target for the SLI (e.g., p99 < 200ms)
  • SLA (Service Level Agreement): contractual commitment with consequences (e.g., 99.9% uptime or credits)

Relationship

SLI (what we measure) → SLO (what we want) → SLA (what we promise)

The SLO should always be stricter than the SLA to have margin.

Common SLIs

SLIMeasurement
Availability% of successful requests
LatencyResponse time percentile
ThroughputRequests per second
Error rate% of requests with errors
FreshnessData age

Error Budget

Error budget = 100% - SLO. If SLO = 99.9%, you have 0.1% margin (~43 min/month). This budget is "spent" on deploys, experiments, and failures.

Why it matters

SLOs turn reliability into a quantifiable engineering decision. Without them, teams don't know how much reliability is enough and oscillate between over-investing in stability or ignoring operational debt until an incident forces them to act.

References

Concepts