Jonatan Matajonmatum.com
conceptsnotesexperimentsessays
© 2026 Jonatan Mata. All rights reserved.v2.1.1
Concepts

Metrics & Monitoring

Collection and visualization of numerical system measurements over time to understand performance, detect anomalies, and make data-driven decisions.

seed#metrics#monitoring#prometheus#grafana#dashboards#alerting

What it is

Metrics are numerical measurements aggregated over time describing system behavior. Monitoring is the process of collecting, storing, visualizing, and alerting on those metrics.

Metric types

TypeBehaviorExampleWhen to use
CounterOnly incrementsTotal requests, accumulated errorsRates (requests/s)
GaugeGoes up and downMemory used, active connectionsCurrent resource state
HistogramValue distribution (server-side)Latency p50/p95/p99Latency percentiles
SummaryValue distribution (client-side)Pre-calculated latencyWhen server-side aggregation is not possible

The Four Golden Signals (Google SRE)

  1. Latency: response time
  2. Traffic: request volume
  3. Errors: error rate
  4. Saturation: how "full" the system is

Typical stack

Application → Prometheus (collection) → Grafana (visualization) → Alertmanager (alerts)

Best practices

  • USE method for resources: Utilization, Saturation, Errors
  • RED method for services: Rate, Errors, Duration
  • Per-service dashboards with the 4 golden signals
  • Alerts based on SLOs, not arbitrary metrics

Why it matters

What is not measured is not improved. Metrics and monitoring turn intuition into data, enabling detection of degradations before they impact users and making capacity decisions based on evidence.

References

  • Prometheus — CNCF monitoring system.
  • Grafana — Visualization platform.
  • OpenTelemetry Metrics — OpenTelemetry, 2024. Open standard for metrics.

Related content

  • Observability

    Ability to understand a system's internal state from its external outputs: logs, metrics, and traces, enabling problem diagnosis without direct system access.

  • Site Reliability Engineering

    Discipline applying software engineering principles to infrastructure operations, focusing on creating scalable and highly reliable systems.

  • Alerting Strategies

    Practices for configuring effective alerts that notify real problems without generating fatigue from excessive notifications.

Concepts