Infrastructure as Code
Practice of defining and managing infrastructure through versioned configuration files instead of manual processes. Foundation of modern operations automation.
Infrastructure as Code (IaC) is treating infrastructure exactly like code: versioned in Git, reviewed in PRs, tested in CI, and applied in an automated and reproducible way.
What problem it solves
Without IaC:
- Snowflake servers — each server is unique, manually configured, impossible to reproduce
- Configuration drift — environments silently diverge
- Outdated documentation — the wiki says one thing, reality is another
- Impossible auditing — who changed what, when, why?
- Slow disaster recovery — rebuilding manually takes hours or days
With IaC: terraform apply and the complete infrastructure is recreated in minutes.
Approaches
Declarative vs imperative
| Approach | Description | Tools |
|---|---|---|
| Declarative | Describe desired state, tool figures out how to get there | Terraform, CloudFormation, Kubernetes YAML |
| Imperative | Describe steps to execute in order | Bash scripts, Ansible (partially), Pulumi |
# Declarative (Terraform) — "I want this"
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
}# Imperative (Pulumi) — "do this"
server = aws.ec2.Instance("web",
ami="ami-0c55b159cbfafe1f0",
instance_type="t3.micro")Mutable vs immutable
- Mutable — update existing servers (Ansible, Chef, Puppet)
- Immutable — destroy and recreate (Terraform + AMIs, containers)
Immutable infrastructure eliminates configuration drift by design.
Main tools
| Tool | Type | Cloud | Language | State |
|---|---|---|---|---|
| Terraform | Provisioning | Multi-cloud | HCL | Remote state |
| OpenTofu | Provisioning | Multi-cloud | HCL | Remote state |
| Pulumi | Provisioning | Multi-cloud | TS, Python, Go | Managed/self-hosted |
| AWS CDK | Provisioning | AWS | TS, Python, Java | CloudFormation |
| CloudFormation | Provisioning | AWS | YAML/JSON | AWS-managed |
| Ansible | Config mgmt | Agnostic | YAML | Stateless |
| Crossplane | Provisioning | Multi-cloud | YAML (K8s CRDs) | Kubernetes |
Fundamental principles
1. Everything in Git
infra/
├── modules/
│ ├── networking/
│ ├── compute/
│ └── database/
├── environments/
│ ├── dev/
│ ├── staging/
│ └── production/
├── .github/workflows/
│ └── terraform.yml
└── README.md
2. Idempotency
Applying the same code N times produces the same result:
terraform apply # Creates 3 instances
terraform apply # No changes. Infrastructure is up-to-date.
terraform apply # No changes. Infrastructure is up-to-date.3. State management
State is the source of truth about what exists:
# Remote backend (shared across team)
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}4. Reusable modules
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.0.0"
name = "production"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24"]
enable_nat_gateway = true
}5. Plan before apply
terraform plan # See what will change
# + aws_instance.web (create)
# ~ aws_security_group.web (modify)
# - aws_instance.old (destroy)
terraform apply # Apply after reviewIaC pipeline
# .github/workflows/terraform.yml
name: Terraform
on:
pull_request:
paths: ['infra/**']
push:
branches: [main]
paths: ['infra/**']
jobs:
plan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform init
- run: terraform validate
- run: terraform plan -out=plan.tfplan
- uses: actions/upload-artifact@v4
with:
name: plan
path: plan.tfplan
apply:
needs: plan
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- uses: actions/download-artifact@v4
with:
name: plan
- run: terraform apply plan.tfplanIaC testing
| Level | Tool | What it validates |
|---|---|---|
| Lint | terraform validate, tflint | Syntax and conventions |
| Static | Checkov, tfsec, Trivy | Security and compliance |
| Unit | Terratest, terraform test | Module logic |
| Integration | Terratest | Temporary real infrastructure |
| Policy | OPA, Sentinel | Organizational policies |
Anti-patterns
- ClickOps — creating resources manually then "importing" them as a patch
- Mega-state — all infra in a single state file. Split by domain/environment.
- No locks — two people applying simultaneously corrupt the state
- Hardcoded values — use variables and tfvars to parameterize
- No plan review — applying without reviewing the plan is like merging without code review
- Ignoring drift — not detecting manual changes that diverge from code
Why it matters
Without IaC, infrastructure is tacit knowledge that lives in the head of whoever configured it. With IaC, every change is traceable, reviewable, and reproducible. It is the foundation on which practices like GitOps, automated disaster recovery, and self-service infrastructure are built.
References
- Infrastructure as Code, 2nd Edition — Kief Morris, 2020. The definitive book on IaC.
- Terraform: Up & Running, 3rd Edition — Yevgeniy Brikman, 2022. Practical Terraform guide with real patterns.
- The Twelve-Factor App — Adam Wiggins, 2011. Cloud-native application principles that complement IaC.
- Terraform Best Practices — Anton Babenko, 2024. Community best practices guide.
- Pulumi Documentation — Pulumi, 2024. Official documentation for the imperative alternative.