Concepts

Metrics & Monitoring

Collection and visualization of numerical system measurements over time to understand performance, detect anomalies, and make data-driven decisions.

seed#metrics#monitoring#prometheus#grafana#dashboards#alerting

What it is

Metrics are numerical measurements aggregated over time describing system behavior. Monitoring is the process of collecting, storing, visualizing, and alerting on those metrics.

Metric types

TypeBehaviorExampleWhen to use
CounterOnly incrementsTotal requests, accumulated errorsRates (requests/s)
GaugeGoes up and downMemory used, active connectionsCurrent resource state
HistogramValue distribution (server-side)Latency p50/p95/p99Latency percentiles
SummaryValue distribution (client-side)Pre-calculated latencyWhen server-side aggregation is not possible

The Four Golden Signals (Google SRE)

  1. Latency: response time
  2. Traffic: request volume
  3. Errors: error rate
  4. Saturation: how "full" the system is

Typical stack

Application → Prometheus (collection) → Grafana (visualization) → Alertmanager (alerts)

Best practices

  • USE method for resources: Utilization, Saturation, Errors
  • RED method for services: Rate, Errors, Duration
  • Per-service dashboards with the 4 golden signals
  • Alerts based on SLOs, not arbitrary metrics

Why it matters

What is not measured is not improved. Metrics and monitoring turn intuition into data, enabling detection of degradations before they impact users and making capacity decisions based on evidence.

References

Concepts