Jonatan Matajonmatum.com
conceptsnotesexperimentsessays
© 2026 Jonatan Mata. All rights reserved.v2.1.1
Concepts

AWS S3

AWS object storage service with 99.999999999% durability, unlimited scalability, and multiple storage classes for cost optimization.

evergreen#aws#s3#storage#serverless#object-storage#cloud

What it is

Amazon S3 (Simple Storage Service) is AWS's object storage service that offers 11 nines durability (99.999999999%) and 99.99% availability. It stores any amount of data — from bytes to petabytes — with HTTP/HTTPS access and REST APIs. It's the foundation of countless AWS architectures, from data lakes to static content distribution.

S3 organizes data into buckets (containers) and objects (files with metadata). Each object can be up to 5TB and is identified by a unique key within the bucket. The service automatically handles replication, versioning, and geographic distribution of data.

S3's distributed architecture enables virtually unlimited scalability without manual intervention. Data is automatically replicated across multiple availability zones within a region, ensuring durability and availability even during hardware failures or natural disasters.

Storage classes and cost optimization

S3 offers multiple storage classes optimized for different access patterns:

ClassAvailabilityTypical useSavings vs Standard
Standard99.99%Frequent accessBase
Intelligent-Tiering99.9%Variable accessUp to 68% (automatic)
Standard-IA99.9%Infrequent accessUp to 40% + retrieval
One Zone-IA99.5%Recreatable data20% less than Standard-IA
Glacier Instant99.9%Archives with instant accessUp to 68% vs Standard-IA
Glacier Flexible99.99%Archives, 1-12 hoursUp to 90% vs Standard
Glacier Deep Archive99.99%Archives, 12+ hoursUp to 95% vs Standard

Intelligent-Tiering automatically monitors access patterns and moves objects between frequent and infrequent access tiers. It charges a small monitoring fee but can generate significant savings on workloads with unpredictable access patterns.

Lifecycle policies

Lifecycle policies automate object transitions between storage classes and deletion:

{
  "Rules": [
    {
      "ID": "DataArchiving",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    }
  ]
}

This policy moves logs to Standard-IA after 30 days, to Glacier after 90 days, to Deep Archive after one year, and deletes them after 7 years.

Event notifications and integration patterns

S3 can send notifications when specific events occur:

{
  "LambdaFunctionConfigurations": [
    {
      "Id": "ProcessImageUpload",
      "LambdaFunctionArn": "arn:aws:lambda:region:account:function:ProcessImage",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": {
        "Key": {
          "FilterRules": [
            {
              "Name": "prefix",
              "Value": "images/"
            },
            {
              "Name": "suffix",
              "Value": ".jpg"
            }
          ]
        }
      }
    }
  ]
}

Common patterns include:

  • File processing: trigger AWS Lambda to process images, videos, or documents
  • Data pipelines: initiate ETL workflows when new data arrives
  • Backup automation: replicate critical objects to another region
  • Compliance: automatically move sensitive data to encrypted storage

S3 Select and query-in-place

S3 Select enables running simple SQL queries directly on CSV, JSON, and Parquet objects without downloading the entire file:

SELECT s.name, s.age FROM s3object s 
WHERE s.age > 25 AND s.department = 'Engineering'

This significantly reduces data transfer costs and improves performance for exploratory analytics. It's especially useful for:

  • Filtering large datasets before processing
  • Extracting specific subsets for analysis
  • Validating data quality without moving complete files
  • Implementing low-cost queries in serverless architectures

Security and best practices

Security checklist

Access control:

  • ✅ Enable Block Public Access by default
  • ✅ Use bucket policies with least privilege principle
  • ✅ Implement Access Points for granular access
  • ✅ Configure specific AWS IAM roles per application
  • ✅ Audit permissions regularly with Access Analyzer

Encryption:

  • ✅ Enable server-side encryption by default (SSE-S3 or SSE-KMS)
  • ✅ Use HTTPS for all transfers
  • ✅ Consider client-side encryption for highly sensitive data
  • ✅ Rotate KMS keys regularly

Monitoring and auditing:

  • ✅ Enable CloudTrail to audit access
  • ✅ Configure CloudWatch metrics and alarms
  • ✅ Use S3 Access Logs for detailed analysis
  • ✅ Implement alerting strategies for anomalous access

Backup and recovery:

  • ✅ Enable versioning for critical objects
  • ✅ Configure cross-region replication for critical data
  • ✅ Implement MFA Delete for additional protection
  • ✅ Test recovery procedures regularly

Integration with modern architectures

S3 integrates natively with AWS services to create robust architectures:

  • Data Lakes: store structured and unstructured data for analytics with Athena, EMR, or Redshift Spectrum
  • CI/CD: store build artifacts, container images, and static assets
  • Content Distribution: origin for CloudFront CDN with automatic invalidation
  • Backup Strategy: destination for database backups, EBS snapshots, and application files
  • Static Hosting: serve static web applications with custom routing

Why it matters

S3 is AWS's most fundamental service — not just for its 11 nines of durability, but for its role as the backbone of virtually every cloud architecture. As a staff engineer, mastering S3 means understanding how to optimize costs through storage classes, implement defense-in-depth security, and design data pipelines that scale.

The difference between basic and expert S3 usage can represent significant savings in storage costs — for example, Glacier Deep Archive costs up to 95% less than Standard. Misconfigured lifecycle policies are one of the main causes of AWS cost overruns. Intelligent-Tiering, S3 Select, and event notifications are tools that separate amateur from enterprise-grade architectures.

References

  • Amazon S3 User Guide — AWS, 2024. Complete official documentation.
  • S3 Storage Classes Performance — AWS, 2024. Detailed storage class comparison.
  • S3 Security Best Practices — AWS, 2024. Official security guide.
  • Optimizing S3 for Performance — AWS, 2024. Performance optimization patterns.
  • New – Automatic Cost Optimization for Amazon S3 via Intelligent Tiering — Jeff Barr, AWS News Blog, 2018. Introduction of S3 Intelligent-Tiering.
  • What is a Data Lake? — AWS, 2024. Data lake architectures with S3.

Related content

  • Serverless

    Cloud computing model where the provider manages infrastructure automatically, allowing code execution without provisioning or managing servers, paying only for actual usage.

  • Infrastructure as Code

    Practice of defining and managing infrastructure through versioned configuration files instead of manual processes. Foundation of modern operations automation.

  • AWS IAM

    AWS identity and access management service controlling who can do what in your account, with granular policies based on the principle of least privilege.

  • From Prototype to Production: A Serverless Second Brain on AWS

    Architecture design for scaling a personal second brain to a production system with AWS serverless — from the current prototype to specialized use cases in legal, research, and community building.

  • Serverless Second Brain

    Production-ready serverless backend for a personal knowledge graph — DynamoDB, Lambda, Bedrock, MCP, Step Functions. The implementation of the architecture described in the 'From Prototype to Production' essay.

Concepts