SLOs, SLIs & SLAs

What it is

SLOs, SLIs, and SLAs are a framework for defining and measuring service reliability:

SLI (Service Level Indicator): metric measuring a service aspect (e.g., p99 latency)
SLO (Service Level Objective): internal target for the SLI (e.g., p99 < 200ms)
SLA (Service Level Agreement): contractual commitment with consequences (e.g., 99.9% uptime or credits)

Relationship

SLI (what we measure) → SLO (what we want) → SLA (what we promise)

The SLO should always be stricter than the SLA to have margin.

Common SLIs

SLI	Measurement
Availability	% of successful requests
Latency	Response time percentile
Throughput	Requests per second
Error rate	% of requests with errors
Freshness	Data age

Error Budget

Error budget = 100% - SLO. If SLO = 99.9%, you have 0.1% margin (~43 min/month). This budget is "spent" on deploys, experiments, and failures.

Why it matters

SLOs turn reliability into a quantifiable engineering decision. Without them, teams don't know how much reliability is enough and oscillate between over-investing in stability or ignoring operational debt until an incident forces them to act.

References

SRE Book - Service Level Objectives — Google.
SLA vs SLO vs SLI — Atlassian, 2024. Practical comparison between SLA, SLO, and SLI.
Implementing SLOs — SRE Workbook — Google, 2018. Practical guide for implementing SLOs.

What it is

SLOs, SLIs, and SLAs are a framework for defining and measuring service reliability:

SLI (Service Level Indicator): metric measuring a service aspect (e.g., p99 latency)
SLO (Service Level Objective): internal target for the SLI (e.g., p99 < 200ms)
SLA (Service Level Agreement): contractual commitment with consequences (e.g., 99.9% uptime or credits)

Relationship

SLI (what we measure) → SLO (what we want) → SLA (what we promise)

The SLO should always be stricter than the SLA to have margin.

Common SLIs

SLI	Measurement
Availability	% of successful requests
Latency	Response time percentile
Throughput	Requests per second
Error rate	% of requests with errors
Freshness	Data age

Error Budget

Error budget = 100% - SLO. If SLO = 99.9%, you have 0.1% margin (~43 min/month). This budget is "spent" on deploys, experiments, and failures.

Why it matters

References

SRE Book - Service Level Objectives — Google.
SLA vs SLO vs SLI — Atlassian, 2024. Practical comparison between SLA, SLO, and SLI.
Implementing SLOs — SRE Workbook — Google, 2018. Practical guide for implementing SLOs.

SLOs, SLIs & SLAs

What it is

Relationship

Common SLIs

Error Budget

Why it matters

References

Related content

SLOs, SLIs & SLAs

What it is

Relationship

Common SLIs

Error Budget

Why it matters

References

Related content