Concepts

DevOps Practices

Set of technical and cultural practices that implement DevOps principles — from Infrastructure as Code to blameless post-mortems. The "how" behind the philosophy.

evergreen#devops#practices#automation#sre

DevOps practices are the concrete implementations of the DevOps philosophy. While DevOps is the "what" and "why," these practices are the "how."

Infrastructure as Code (IaC)

Define and manage infrastructure through versioned configuration files.

Main tools

ToolFocusLanguage
TerraformMulti-cloud, declarativeHCL
PulumiMulti-cloud, imperativeTypeScript, Python, Go
AWS CDKAWS, imperativeTypeScript, Python, Java
CloudFormationAWS, declarativeYAML/JSON
AnsibleConfiguration, agentlessYAML

Terraform example

resource "aws_s3_bucket" "data" {
  bucket = "my-data-bucket"
  
  versioning {
    enabled = true
  }
  
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
    }
  }
}

IaC principles

  • Idempotency — applying multiple times produces the same result
  • Versioned — all infra in Git with full history
  • Review — infra changes go through PR like code
  • Modules — reuse common patterns
  • State management — shared remote state (S3, Terraform Cloud)

Configuration Management

Keep servers in a desired state automatically.

# Ansible playbook
- hosts: webservers
  tasks:
    - name: Install nginx
      apt:
        name: nginx
        state: present
    
    - name: Copy config
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
      notify: restart nginx
  
  handlers:
    - name: restart nginx
      service:
        name: nginx
        state: restarted

Containerization

Package applications with all their dependencies.

# Multi-stage build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
 
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/index.js"]

Docker best practices

  • Minimal base images (Alpine, distroless)
  • Multi-stage builds to reduce size
  • One process per container
  • Don't run as root
  • .dockerignore to exclude unnecessary files
  • Pin base image versions

GitOps

Use Git as the source of truth for infrastructure and deployments.

Principles

  1. Declarative — desired state is in Git
  2. Versioned — Git is the change history
  3. Automatic — agents reconcile actual state with desired
  4. Auditable — every change has author, timestamp, and reason

Tools

  • ArgoCD — Kubernetes GitOps controller
  • Flux — Kubernetes GitOps toolkit
  • Atlantis — Terraform pull request automation
# ArgoCD Application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
spec:
  source:
    repoURL: https://github.com/org/repo
    path: k8s/
    targetRevision: main
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Feature Flags

Separate deployment from release — code in production but functionality controlled.

// Example with LaunchDarkly / Unleash / custom
if (featureFlags.isEnabled('new-checkout', { userId })) {
  return <NewCheckout />;
}
return <LegacyCheckout />;

Use cases

  • Canary releases — enable for % of users
  • Beta testing — enable for specific users
  • Kill switch — disable problematic feature without deploy
  • A/B testing — compare variants
  • Trunk-based development — merge incomplete code

Observability

The three pillars for understanding production systems:

Logs

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "error",
  "service": "api",
  "trace_id": "abc123",
  "message": "Payment failed",
  "user_id": "user_456",
  "error_code": "INSUFFICIENT_FUNDS"
}

Structured logging — parseable JSON, not free text.

Metrics

# Prometheus format
http_requests_total{method="GET", status="200"} 1234
http_request_duration_seconds{quantile="0.99"} 0.25

Types: counters, gauges, histograms, summaries.

Traces

Follow a request through multiple services:

[API Gateway] → [Auth Service] → [User Service] → [Database]
     2ms            5ms              3ms            10ms

Tools: Jaeger, Zipkin, AWS X-Ray, Datadog APM.

Incident Management

On-call

  • Defined rotations (PagerDuty, Opsgenie)
  • Runbooks for common incidents
  • Clear escalation paths
  • On-call compensation

Incident response

  1. Detect — alerts, users, monitoring
  2. Triage — severity, impact, who responds
  3. Mitigate — restore service (rollback, scale, failover)
  4. Resolve — permanent fix
  5. Learn — post-mortem

Severities

SevImpactResponse timeExample
1Service downImmediateSite won't load
2Major degradation< 30 minPayments failing
3Minor degradation< 4 hoursSecondary feature broken
4Low impactNext business dayCosmetic bug

Chaos Engineering

Inject controlled failures to discover weaknesses.

Principles

  1. Define "steady state" (normal metrics)
  2. Hypothesis: the system tolerates X failure
  3. Introduce real-world variables (latency, failures, partitions)
  4. Try to disprove the hypothesis
  5. Minimize blast radius

Tools

  • Chaos Monkey — terminates random instances
  • Gremlin — chaos engineering platform
  • Litmus — chaos engineering for Kubernetes
  • AWS Fault Injection Simulator — native chaos on AWS

Security Practices (DevSecOps)

Integrate security throughout the pipeline:

Shift left

  • SAST — static code analysis (SonarQube, Semgrep)
  • SCA — dependency analysis (Snyk, Dependabot)
  • Secret scanning — detect credentials in code
  • Container scanning — image vulnerabilities (Trivy)

Runtime

  • DAST — dynamic application testing
  • WAF — web application firewall
  • Runtime protection — detect anomalous behavior

Why it matters

These practices are not optional for teams operating software in production. Each one reduces a specific type of risk: IaC eliminates manual configuration, feature flags decouple deploy from release, observability turns incidents into learning. Adopting them incrementally is more effective than trying to implement everything at once.

References

Concepts