Jonatan Matajonmatum.com
conceptsnotesexperimentsessays
© 2026 Jonatan Mata. All rights reserved.v2.1.1
Concepts

AWS DynamoDB

AWS serverless NoSQL database with single-digit millisecond latency at any scale, ideal for applications requiring high performance and automatic scalability.

evergreen#aws#dynamodb#nosql#serverless#database#key-value

What it is

DynamoDB is AWS's serverless NoSQL database that offers consistent single-digit millisecond latency regardless of data size or request volume. No servers to manage, patches to apply, or capacity to manually plan.

Unlike relational databases, DynamoDB uses a flexible data model where each item can have different attributes. Its distributed architecture automatically partitions data using the partition key, ensuring horizontal scalability without theoretical limits.

DynamoDB's design prioritizes availability and latency over strong consistency, following the CAP theorem. This means applications must be designed considering eventual consistency, though it offers strong consistency as an option for specific reads.

Data model and single-table design

DynamoDB uses a data model based on tables, items, and attributes:

  • Partition Key (PK): distributes items across physical partitions
  • Sort Key (SK): optional, orders items within a partition
  • Attributes: fields with flexible data types (String, Number, Binary, Boolean, List, Map, Set)

Single-table design example

Consider an e-commerce system with orders and customers:

# Item structure in a single table
{
  "PK": "CUSTOMER#123",
  "SK": "PROFILE",
  "name": "Juan Pérez",
  "email": "juan@example.com",
  "created": "2024-01-15"
}
 
{
  "PK": "CUSTOMER#123", 
  "SK": "ORDER#456",
  "total": 99.99,
  "status": "shipped",
  "items": ["product-a", "product-b"]
}
 
{
  "PK": "ORDER#456",
  "SK": "METADATA", 
  "customer_id": "123",
  "shipping_address": "...",
  "payment_method": "card"
}

This pattern enables efficient queries:

  • Get customer profile: PK = CUSTOMER#123 AND SK = PROFILE
  • Get all customer orders: PK = CUSTOMER#123 AND SK begins_with ORDER#
  • Get order details: PK = ORDER#456 AND SK = METADATA

Secondary indexes

Global Secondary Index (GSI)

Enables queries by attributes different from the primary key. Each GSI has its own partition key and sort key, with independent throughput capacity.

When to use GSI:

  • You need to query by attributes that aren't the primary key
  • Access patterns require different data distributions
  • You can tolerate eventual consistency (GSIs are eventually consistent)

Local Secondary Index (LSI)

Shares the same partition key as the base table but uses a different sort key. Limited to 10GB per partition.

When to use LSI:

  • You need strong consistency in alternative queries
  • Data per partition doesn't exceed 10GB
  • You want to sort by a different attribute while maintaining the same partition key

DynamoDB Streams and event-driven architectures

DynamoDB Streams captures real-time changes (INSERT, MODIFY, DELETE) and sends them to AWS Lambda or Kinesis. Each stream record contains:

{
  "eventName": "INSERT",
  "dynamodb": {
    "Keys": {"PK": {"S": "ORDER#456"}},
    "NewImage": {"status": {"S": "created"}, "total": {"N": "99.99"}},
    "StreamViewType": "NEW_AND_OLD_IMAGES"
  }
}

Common patterns:

  • Event sourcing: each change generates events for other services
  • Cache invalidation: update caches when data changes
  • Analytics: send changes to data warehouses
  • Notifications: trigger emails or push notifications

Access patterns and optimization

Query vs Scan

  • Query: efficient access using partition key (and optionally sort key)
  • Scan: examines all items — expensive and slow, avoid in production

Filter expressions and pagination

import boto3
 
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ecommerce')
 
# Query with filter
response = table.query(
    KeyConditionExpression=Key('PK').eq('CUSTOMER#123'),
    FilterExpression=Attr('status').eq('active'),
    Limit=10
)
 
# Pagination
while 'LastEvaluatedKey' in response:
    response = table.query(
        KeyConditionExpression=Key('PK').eq('CUSTOMER#123'),
        ExclusiveStartKey=response['LastEvaluatedKey'],
        Limit=10
    )

TTL and backup strategies

Time to Live (TTL): automatically expires items using a Unix timestamp.

# Configure TTL on an attribute
table.meta.client.update_time_to_live(
    TableName='sessions',
    TimeToLiveSpecification={
        'AttributeName': 'expires_at',
        'Enabled': True
    }
)

Backup strategies:

  • Point-in-Time Recovery (PITR): continuous restoration up to 35 days
  • On-demand backups: manual snapshots for long-term retention
  • Cross-region replication: Global Tables for disaster recovery

Cost comparison: on-demand vs provisioned

WorkloadOn-demandProvisionedRecommendation
Development/Testing$0.25 per 1M reads$0.09 per RCU/monthOn-demand
Predictable traffic (1000 RPS constant)$648/month$233/monthProvisioned
Sporadic traffic (5000 RPS spikes)$324/month$1,166/monthOn-demand
New application (unknown pattern)VariableThrottling riskOn-demand

Key factors:

  • On-demand: 25% more expensive per request, but no commitments
  • Provisioned: requires planning, but 60-70% cheaper for stable loads
  • Auto Scaling in provisioned can mitigate spikes, but with adjustment latency

Code example: creation with Terraform

resource "aws_dynamodb_table" "ecommerce" {
  name           = "ecommerce"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "PK"
  range_key      = "SK"
 
  attribute {
    name = "PK"
    type = "S"
  }
 
  attribute {
    name = "SK" 
    type = "S"
  }
 
  attribute {
    name = "GSI1PK"
    type = "S"
  }
 
  global_secondary_index {
    name     = "GSI1"
    hash_key = "GSI1PK"
    projection_type = "ALL"
  }
 
  stream_enabled   = true
  stream_view_type = "NEW_AND_OLD_IMAGES"
 
  point_in_time_recovery {
    enabled = true
  }
 
  tags = {
    Environment = "production"
    Service     = "ecommerce"
  }
}

Why it matters

DynamoDB represents a fundamental shift in database design for serverless applications. Its per-request pricing model eliminates the need to provision capacity, but requires careful schema design based on specific access patterns.

For teams migrating from relational databases, DynamoDB demands rethinking normalization — single-table design may seem counterintuitive, but it's essential for minimizing costs and latency. The absence of JOINs means denormalization and data duplication are valid strategies.

Event-driven architecture ecosystems benefit enormously from DynamoDB Streams, enabling reactive architectures that scale automatically without infrastructure management.

References

  • Amazon DynamoDB Developer Guide — AWS, 2024. Complete official documentation.
  • The DynamoDB Book — Alex DeBrie, 2021. Definitive guide to data modeling and patterns.
  • DynamoDB Pricing — AWS, 2024. Cost calculator and mode comparison.
  • Best Practices for DynamoDB — AWS, 2024. Design patterns and optimization.
  • DynamoDB Streams and Lambda — AWS, 2024. Integration for event-driven architectures.
  • Single Table Design with DynamoDB — Alex DeBrie, 2019. Detailed explanation of single-table pattern.

Related content

  • Serverless

    Cloud computing model where the provider manages infrastructure automatically, allowing code execution without provisioning or managing servers, paying only for actual usage.

  • AWS Lambda

    AWS serverless compute service that runs code in response to events without provisioning or managing servers, automatically scaling from zero to thousands of concurrent executions.

  • Event-Driven Architecture

    Architectural pattern where components communicate through asynchronous events, enabling decoupled, scalable, and reactive systems.

  • From Prototype to Production: A Serverless Second Brain on AWS

    Architecture design for scaling a personal second brain to a production system with AWS serverless — from the current prototype to specialized use cases in legal, research, and community building.

  • Terraform AWS Serverless Modules

    Collection of 13 Terraform modules published on the Terraform Registry for deploying serverless architectures on AWS, with 12 examples covering basic ECS to full-stack CRUD with DynamoDB and AgentCore with MCP.

  • Serverless Second Brain

    Production-ready serverless backend for a personal knowledge graph — DynamoDB, Lambda, Bedrock, MCP, Step Functions. The implementation of the architecture described in the 'From Prototype to Production' essay.

Concepts