Concepts

Distributed Tracing

Observability technique tracking requests across multiple services in distributed systems, enabling bottleneck identification and failure diagnosis.

seed#tracing#distributed#opentelemetry#jaeger#spans#observability

What it is

Distributed tracing tracks a request from its origin to its destination through all services it touches. Each service generates a "span" with timing and metadata, and all spans are grouped into a "trace" showing the complete flow.

Concepts

ConceptDescriptionExample
TraceThe complete journey of a requestHTTP request from client to response
SpanAn operation within the trace (start, end, metadata)Database call, service invocation
Context propagationPassing trace ID between servicestraceparent header (W3C Trace Context)
SamplingNot tracing 100% of requests to reduce costHead-based (1%), tail-based (errors only)

Flow

Client → API Gateway (span 1)
          → Auth Service (span 2)
          → Product Service (span 3)
             → Database (span 4)
          → Response

Tools

ToolType
JaegerOpen-source (CNCF)
Grafana TempoOpen-source, Grafana integrated
AWS X-RayManaged AWS
Datadog APMSaaS
OpenTelemetryInstrumentation standard

Why it matters

In distributed systems, a request traverses multiple services. Without distributed tracing, diagnosing latency or errors is like finding a needle in a haystack. Traces connect the dots between services and reveal where time is being lost.

References

  • OpenTelemetry Tracing — Official documentation.
  • Jaeger — CNCF, 2024. Open source distributed tracing system.
  • Zipkin — OpenZipkin, 2024. Distributed tracing system.
Concepts