AWS DynamoDB

What it is

DynamoDB is AWS's serverless NoSQL database that offers consistent single-digit millisecond latency regardless of data size or request volume. No servers to manage, patches to apply, or capacity to manually plan.

Unlike relational databases, DynamoDB uses a flexible data model where each item can have different attributes. Its distributed architecture automatically partitions data using the partition key, ensuring horizontal scalability without theoretical limits.

DynamoDB's design prioritizes availability and latency over strong consistency, following the CAP theorem. This means applications must be designed considering eventual consistency, though it offers strong consistency as an option for specific reads.

Data model and single-table design

DynamoDB uses a data model based on tables, items, and attributes:

Partition Key (PK): distributes items across physical partitions
Sort Key (SK): optional, orders items within a partition
Attributes: fields with flexible data types (String, Number, Binary, Boolean, List, Map, Set)

Single-table design example

Consider an e-commerce system with orders and customers:

# Item structure in a single table
{
  "PK": "CUSTOMER#123",
  "SK": "PROFILE",
  "name": "Juan Pérez",
  "email": "juan@example.com",
  "created": "2024-01-15"
}
 
{
  "PK": "CUSTOMER#123", 
  "SK": "ORDER#456",
  "total": 99.99,
  "status": "shipped",
  "items": ["product-a", "product-b"]
}
 
{
  "PK": "ORDER#456",
  "SK": "METADATA", 
  "customer_id": "123",
  "shipping_address": "...",
  "payment_method": "card"
}

This pattern enables efficient queries:

Get customer profile: PK = CUSTOMER#123 AND SK = PROFILE
Get all customer orders: PK = CUSTOMER#123 AND SK begins_with ORDER#
Get order details: PK = ORDER#456 AND SK = METADATA

Secondary indexes

Global Secondary Index (GSI)

Enables queries by attributes different from the primary key. Each GSI has its own partition key and sort key, with independent throughput capacity.

When to use GSI:

You need to query by attributes that aren't the primary key
Access patterns require different data distributions
You can tolerate eventual consistency (GSIs are eventually consistent)

Local Secondary Index (LSI)

Shares the same partition key as the base table but uses a different sort key. Limited to 10GB per partition.

When to use LSI:

You need strong consistency in alternative queries
Data per partition doesn't exceed 10GB
You want to sort by a different attribute while maintaining the same partition key

DynamoDB Streams and event-driven architectures

DynamoDB Streams captures real-time changes (INSERT, MODIFY, DELETE) and sends them to AWS Lambda or Kinesis. Each stream record contains:

{
  "eventName": "INSERT",
  "dynamodb": {
    "Keys": {"PK": {"S": "ORDER#456"}},
    "NewImage": {"status": {"S": "created"}, "total": {"N": "99.99"}},
    "StreamViewType": "NEW_AND_OLD_IMAGES"
  }
}

Common patterns:

Event sourcing: each change generates events for other services
Cache invalidation: update caches when data changes
Analytics: send changes to data warehouses
Notifications: trigger emails or push notifications

Access patterns and optimization

Query vs Scan

Query: efficient access using partition key (and optionally sort key)
Scan: examines all items — expensive and slow, avoid in production

Filter expressions and pagination

import boto3
 
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ecommerce')
 
# Query with filter
response = table.query(
    KeyConditionExpression=Key('PK').eq('CUSTOMER#123'),
    FilterExpression=Attr('status').eq('active'),
    Limit=10
)
 
# Pagination
while 'LastEvaluatedKey' in response:
    response = table.query(
        KeyConditionExpression=Key('PK').eq('CUSTOMER#123'),
        ExclusiveStartKey=response['LastEvaluatedKey'],
        Limit=10
    )

TTL and backup strategies

Time to Live (TTL): automatically expires items using a Unix timestamp.

# Configure TTL on an attribute
table.meta.client.update_time_to_live(
    TableName='sessions',
    TimeToLiveSpecification={
        'AttributeName': 'expires_at',
        'Enabled': True
    }
)

Backup strategies:

Point-in-Time Recovery (PITR): continuous restoration up to 35 days
On-demand backups: manual snapshots for long-term retention
Cross-region replication: Global Tables for disaster recovery

Cost comparison: on-demand vs provisioned

Workload	On-demand	Provisioned	Recommendation
Development/Testing	$0.25 per 1M reads	$0.09 per RCU/month	On-demand
Predictable traffic (1000 RPS constant)	$648/month	$233/month	Provisioned
Sporadic traffic (5000 RPS spikes)	$324/month	$1,166/month	On-demand
New application (unknown pattern)	Variable	Throttling risk	On-demand

Key factors:

On-demand: 25% more expensive per request, but no commitments
Provisioned: requires planning, but 60-70% cheaper for stable loads
Auto Scaling in provisioned can mitigate spikes, but with adjustment latency

Code example: creation with Terraform

resource "aws_dynamodb_table" "ecommerce" {
  name           = "ecommerce"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "PK"
  range_key      = "SK"
 
  attribute {
    name = "PK"
    type = "S"
  }
 
  attribute {
    name = "SK" 
    type = "S"
  }
 
  attribute {
    name = "GSI1PK"
    type = "S"
  }
 
  global_secondary_index {
    name     = "GSI1"
    hash_key = "GSI1PK"
    projection_type = "ALL"
  }
 
  stream_enabled   = true
  stream_view_type = "NEW_AND_OLD_IMAGES"
 
  point_in_time_recovery {
    enabled = true
  }
 
  tags = {
    Environment = "production"
    Service     = "ecommerce"
  }
}

Why it matters

DynamoDB represents a fundamental shift in database design for serverless applications. Its per-request pricing model eliminates the need to provision capacity, but requires careful schema design based on specific access patterns.

For teams migrating from relational databases, DynamoDB demands rethinking normalization — single-table design may seem counterintuitive, but it's essential for minimizing costs and latency. The absence of JOINs means denormalization and data duplication are valid strategies.

Event-driven architecture ecosystems benefit enormously from DynamoDB Streams, enabling reactive architectures that scale automatically without infrastructure management.

References

Amazon DynamoDB Developer Guide — AWS, 2024. Complete official documentation.
The DynamoDB Book — Alex DeBrie, 2021. Definitive guide to data modeling and patterns.
DynamoDB Pricing — AWS, 2024. Cost calculator and mode comparison.
Best Practices for DynamoDB — AWS, 2024. Design patterns and optimization.
DynamoDB Streams and Lambda — AWS, 2024. Integration for event-driven architectures.
Single Table Design with DynamoDB — Alex DeBrie, 2019. Detailed explanation of single-table pattern.

What it is

Data model and single-table design

DynamoDB uses a data model based on tables, items, and attributes:

Partition Key (PK): distributes items across physical partitions
Sort Key (SK): optional, orders items within a partition
Attributes: fields with flexible data types (String, Number, Binary, Boolean, List, Map, Set)

Single-table design example

Consider an e-commerce system with orders and customers:

# Item structure in a single table
{
  "PK": "CUSTOMER#123",
  "SK": "PROFILE",
  "name": "Juan Pérez",
  "email": "juan@example.com",
  "created": "2024-01-15"
}
 
{
  "PK": "CUSTOMER#123", 
  "SK": "ORDER#456",
  "total": 99.99,
  "status": "shipped",
  "items": ["product-a", "product-b"]
}
 
{
  "PK": "ORDER#456",
  "SK": "METADATA", 
  "customer_id": "123",
  "shipping_address": "...",
  "payment_method": "card"
}

This pattern enables efficient queries:

Get customer profile: PK = CUSTOMER#123 AND SK = PROFILE
Get all customer orders: PK = CUSTOMER#123 AND SK begins_with ORDER#
Get order details: PK = ORDER#456 AND SK = METADATA

Secondary indexes

Global Secondary Index (GSI)

Enables queries by attributes different from the primary key. Each GSI has its own partition key and sort key, with independent throughput capacity.

When to use GSI:

You need to query by attributes that aren't the primary key
Access patterns require different data distributions
You can tolerate eventual consistency (GSIs are eventually consistent)

Local Secondary Index (LSI)

Shares the same partition key as the base table but uses a different sort key. Limited to 10GB per partition.

When to use LSI:

You need strong consistency in alternative queries
Data per partition doesn't exceed 10GB
You want to sort by a different attribute while maintaining the same partition key

DynamoDB Streams and event-driven architectures

DynamoDB Streams captures real-time changes (INSERT, MODIFY, DELETE) and sends them to AWS Lambda or Kinesis. Each stream record contains:

{
  "eventName": "INSERT",
  "dynamodb": {
    "Keys": {"PK": {"S": "ORDER#456"}},
    "NewImage": {"status": {"S": "created"}, "total": {"N": "99.99"}},
    "StreamViewType": "NEW_AND_OLD_IMAGES"
  }
}

Common patterns:

Event sourcing: each change generates events for other services
Cache invalidation: update caches when data changes
Analytics: send changes to data warehouses
Notifications: trigger emails or push notifications

Access patterns and optimization

Query vs Scan

Query: efficient access using partition key (and optionally sort key)
Scan: examines all items — expensive and slow, avoid in production

Filter expressions and pagination

import boto3
 
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ecommerce')
 
# Query with filter
response = table.query(
    KeyConditionExpression=Key('PK').eq('CUSTOMER#123'),
    FilterExpression=Attr('status').eq('active'),
    Limit=10
)
 
# Pagination
while 'LastEvaluatedKey' in response:
    response = table.query(
        KeyConditionExpression=Key('PK').eq('CUSTOMER#123'),
        ExclusiveStartKey=response['LastEvaluatedKey'],
        Limit=10
    )

TTL and backup strategies

Time to Live (TTL): automatically expires items using a Unix timestamp.

# Configure TTL on an attribute
table.meta.client.update_time_to_live(
    TableName='sessions',
    TimeToLiveSpecification={
        'AttributeName': 'expires_at',
        'Enabled': True
    }
)

Backup strategies:

Point-in-Time Recovery (PITR): continuous restoration up to 35 days
On-demand backups: manual snapshots for long-term retention
Cross-region replication: Global Tables for disaster recovery

Cost comparison: on-demand vs provisioned

Workload	On-demand	Provisioned	Recommendation
Development/Testing	$0.25 per 1M reads	$0.09 per RCU/month	On-demand
Predictable traffic (1000 RPS constant)	$648/month	$233/month	Provisioned
Sporadic traffic (5000 RPS spikes)	$324/month	$1,166/month	On-demand
New application (unknown pattern)	Variable	Throttling risk	On-demand

Key factors:

On-demand: 25% more expensive per request, but no commitments
Provisioned: requires planning, but 60-70% cheaper for stable loads
Auto Scaling in provisioned can mitigate spikes, but with adjustment latency

Code example: creation with Terraform

resource "aws_dynamodb_table" "ecommerce" {
  name           = "ecommerce"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "PK"
  range_key      = "SK"
 
  attribute {
    name = "PK"
    type = "S"
  }
 
  attribute {
    name = "SK" 
    type = "S"
  }
 
  attribute {
    name = "GSI1PK"
    type = "S"
  }
 
  global_secondary_index {
    name     = "GSI1"
    hash_key = "GSI1PK"
    projection_type = "ALL"
  }
 
  stream_enabled   = true
  stream_view_type = "NEW_AND_OLD_IMAGES"
 
  point_in_time_recovery {
    enabled = true
  }
 
  tags = {
    Environment = "production"
    Service     = "ecommerce"
  }
}

Why it matters

Event-driven architecture ecosystems benefit enormously from DynamoDB Streams, enabling reactive architectures that scale automatically without infrastructure management.

References

Amazon DynamoDB Developer Guide — AWS, 2024. Complete official documentation.
The DynamoDB Book — Alex DeBrie, 2021. Definitive guide to data modeling and patterns.
DynamoDB Pricing — AWS, 2024. Cost calculator and mode comparison.
Best Practices for DynamoDB — AWS, 2024. Design patterns and optimization.
DynamoDB Streams and Lambda — AWS, 2024. Integration for event-driven architectures.
Single Table Design with DynamoDB — Alex DeBrie, 2019. Detailed explanation of single-table pattern.

AWS DynamoDB

What it is

Data model and single-table design

Single-table design example

Secondary indexes

Global Secondary Index (GSI)

Local Secondary Index (LSI)

DynamoDB Streams and event-driven architectures

Access patterns and optimization

Query vs Scan

Filter expressions and pagination

TTL and backup strategies

Cost comparison: on-demand vs provisioned

Code example: creation with Terraform

Why it matters

References

Related content

AWS DynamoDB

What it is

Data model and single-table design

Single-table design example

Secondary indexes

Global Secondary Index (GSI)

Local Secondary Index (LSI)

DynamoDB Streams and event-driven architectures

Access patterns and optimization

Query vs Scan

Filter expressions and pagination

TTL and backup strategies

Cost comparison: on-demand vs provisioned

Code example: creation with Terraform

Why it matters

References

Related content