Concepts

Observability

Ability to understand a system's internal state from its external outputs: logs, metrics, and traces, enabling problem diagnosis without direct system access.

seed#observability#monitoring#logs#metrics#traces#opentelemetry

What it is

Observability is the ability to understand what's happening inside a system based on the data it produces. Unlike monitoring (which checks known conditions), observability enables investigating unknown problems.

The three pillars

Logs

Textual event records:

  • Structured logging (JSON) for efficient search
  • Levels: DEBUG, INFO, WARN, ERROR
  • Correlation with trace IDs

Metrics

Numerical measurements aggregated over time:

  • Counters: values that only increment
  • Gauges: values that go up and down
  • Histograms: value distribution

Traces

Request tracking through distributed services:

  • Span: unit of work
  • Trace: set of related spans
  • Context propagation: passing trace ID between services

OpenTelemetry

CNCF standard unifying logs, metrics, and traces instrumentation with SDKs for all major languages.

Tools

ToolType
GrafanaDashboards
PrometheusMetrics
Jaeger/TempoTraces
LokiLogs
DatadogAll-in-one
AWS CloudWatchAWS native

Why it matters

Observability is what enables understanding a system's behavior in production without predicting in advance what questions you will need to answer. Unlike traditional monitoring, which checks known conditions, observability enables investigating the unknown.

References

Concepts