Distributed Tracing
Observability technique tracking requests across multiple services in distributed systems, enabling bottleneck identification and failure diagnosis.
seed#tracing#distributed#opentelemetry#jaeger#spans#observability
What it is
Distributed tracing tracks a request from its origin to its destination through all services it touches. Each service generates a "span" with timing and metadata, and all spans are grouped into a "trace" showing the complete flow.
Concepts
| Concept | Description | Example |
|---|---|---|
| Trace | The complete journey of a request | HTTP request from client to response |
| Span | An operation within the trace (start, end, metadata) | Database call, service invocation |
| Context propagation | Passing trace ID between services | traceparent header (W3C Trace Context) |
| Sampling | Not tracing 100% of requests to reduce cost | Head-based (1%), tail-based (errors only) |
Flow
Client → API Gateway (span 1)
→ Auth Service (span 2)
→ Product Service (span 3)
→ Database (span 4)
→ Response
Tools
| Tool | Type |
|---|---|
| Jaeger | Open-source (CNCF) |
| Grafana Tempo | Open-source, Grafana integrated |
| AWS X-Ray | Managed AWS |
| Datadog APM | SaaS |
| OpenTelemetry | Instrumentation standard |
Why it matters
In distributed systems, a request traverses multiple services. Without distributed tracing, diagnosing latency or errors is like finding a needle in a haystack. Traces connect the dots between services and reveal where time is being lost.
References
- OpenTelemetry Tracing — Official documentation.
- Jaeger — CNCF, 2024. Open source distributed tracing system.
- Zipkin — OpenZipkin, 2024. Distributed tracing system.