Jonatan Matajonmatum.com
conceptsnotesexperimentsessays
© 2026 Jonatan Mata. All rights reserved.v2.1.1
Concepts

AWS Step Functions

AWS serverless orchestration service that coordinates multiple services into visual workflows using Amazon States Language (ASL), with built-in error handling, retries, and parallel execution.

evergreen#aws#step-functions#orchestration#serverless#workflow#state-machine#asl

What it is

AWS Step Functions is a serverless orchestration service that coordinates multiple AWS services into visual workflows using Amazon States Language (ASL). It defines flows as declarative state machines with steps, conditions, parallelism, and built-in error handling.

Unlike orchestrating services with custom code in AWS Lambda, Step Functions separates business logic from workflow coordination. Each state in the machine can invoke AWS services, external APIs, or Lambda functions, while the service automatically handles retries, timeouts, and state transitions.

The service uses JSON to define state machines that are both executable and visual documentation of the process. This facilitates debugging, auditing, and maintaining complex workflows in microservices architectures and event-driven systems.

Workflow types

Step Functions offers two workflow types with different characteristics and pricing:

FeatureStandardExpress
Maximum duration1 year5 minutes
Pricing modelPer state transitionPer execution and duration
Execution historyComplete and persistentLimited, optional
Execution guaranteesExactly onceAt least once
Use casesLong, durable workflowsHigh volume, low latency
Typical costHigher for high volumeLower for frequent executions
Execution limit2,000 concurrent100,000 concurrent

Standard workflows are ideal for business processes requiring complete auditing, such as approvals, ETL pipelines, or complex agentic workflows. Express workflows optimize for streaming cases, real-time data validation, or microservices requiring fast orchestration.

Fundamental states

Amazon States Language defines seven state types for building workflows:

{
  "Comment": "Order processing example",
  "StartAt": "ValidateOrder",
  "States": {
    "ValidateOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:ValidateOrder",
      "Next": "CheckInventory",
      "Retry": [
        {
          "ErrorEquals": ["Lambda.ServiceException"],
          "IntervalSeconds": 2,
          "MaxAttempts": 3,
          "BackoffRate": 2.0
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["States.TaskFailed"],
          "Next": "OrderFailed"
        }
      ]
    },
    "CheckInventory": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.inventory.available",
          "BooleanEquals": true,
          "Next": "ProcessPayment"
        }
      ],
      "Default": "OutOfStock"
    },
    "ProcessPayment": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "ChargeCard",
          "States": {
            "ChargeCard": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:123456789012:function:ChargeCard",
              "End": true
            }
          }
        },
        {
          "StartAt": "SendConfirmation",
          "States": {
            "SendConfirmation": {
              "Type": "Task",
              "Resource": "arn:aws:states:::sns:publish",
              "Parameters": {
                "TopicArn": "arn:aws:sns:us-east-1:123456789012:order-confirmations",
                "Message.$": "$.confirmationMessage"
              },
              "End": true
            }
          }
        }
      ],
      "Next": "ProcessItems"
    },
    "ProcessItems": {
      "Type": "Map",
      "ItemsPath": "$.order.items",
      "MaxConcurrency": 5,
      "Iterator": {
        "StartAt": "ProcessItem",
        "States": {
          "ProcessItem": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-east-1:123456789012:function:ProcessItem",
            "End": true
          }
        }
      },
      "Next": "OrderComplete"
    },
    "OutOfStock": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:us-east-1:123456789012:inventory-alerts",
        "Message": "Item out of stock"
      },
      "Next": "OrderFailed"
    },
    "OrderComplete": {
      "Type": "Succeed"
    },
    "OrderFailed": {
      "Type": "Fail",
      "Cause": "Order processing failed"
    }
  }
}

This example demonstrates Task (execute Lambda), Choice (conditional branching), Parallel (concurrent execution), Map (array iteration), Succeed/Fail (explicit termination).

Error handling and retries

Step Functions includes robust patterns for handling failures:

Retry configures automatic retries with exponential backoff:

  • ErrorEquals: error types to retry
  • IntervalSeconds: initial time between retries
  • MaxAttempts: maximum number of retries
  • BackoffRate: multiplier for exponential backoff

Catch handles errors that don't resolve with retries:

  • Redirects to compensation or cleanup states
  • Preserves error information for debugging
  • Enables implementing the saga pattern for distributed transactions

AWS service integration

Step Functions integrates natively with over 200 AWS services without intermediate Lambda code:

  • Compute: Lambda, ECS, Fargate, Batch
  • Storage: S3, DynamoDB, RDS
  • Messaging: SNS, SQS, EventBridge
  • AI/ML: Bedrock, SageMaker, Comprehend
  • Analytics: Athena, Glue, EMR

This direct integration reduces latency, cost, and complexity compared to orchestrating services through Lambda wrapper functions.

Design patterns

Saga Pattern: For distributed transactions, each step includes a compensation action. If a step fails, Step Functions executes compensations in reverse order.

Human-in-the-loop: Workflows can pause awaiting human approval using callback tokens. Useful for expense approvals, content reviews, or decisions requiring human judgment.

Fan-out/Fan-in: The Map state processes arrays in parallel with concurrency control. Ideal for processing data batches, validating multiple inputs, or executing independent tasks.

Circuit Breaker: Combining Choice and Wait, you can implement circuit breakers that pause workflows when downstream services fail repeatedly.

Why it matters

Step Functions transforms complex workflows from imperative code to auditable declarative definitions. Instead of handling coordination, retries, and error states in custom code, you define the flow once and the service handles reliable execution.

For teams building distributed systems, this means less orchestration code to maintain, better system state visibility, and ability to modify workflows without code deployments. The separation between business logic and coordination facilitates testing, debugging, and evolution of complex processes.

In serverless and microservices architectures, Step Functions acts as the "glue" that coordinates independent services into cohesive business processes, with the reliability and observability that production systems require.

References

  • AWS Step Functions Developer Guide — AWS, 2024. Complete official documentation.
  • Amazon States Language Specification — AWS, 2024. Complete ASL specification.
  • Step Functions Best Practices — AWS, 2024. Patterns and best practices.
  • Serverless Patterns - Step Functions — AWS Serverless Land, 2024. Pattern collection with code.
  • AWS Step Functions Workflow Studio — AWS Blog, 2021. Visual tool for building state machines.
  • Step Functions Pricing — AWS, 2024. Detailed pricing model by workflow type.

Related content

  • Serverless

    Cloud computing model where the provider manages infrastructure automatically, allowing code execution without provisioning or managing servers, paying only for actual usage.

  • Agentic Workflows

    Design patterns where AI agents execute complex multi-step tasks autonomously, combining reasoning, tool use, and iterative decision-making.

  • Event-Driven Architecture

    Architectural pattern where components communicate through asynchronous events, enabling decoupled, scalable, and reactive systems.

  • AWS Lambda

    AWS serverless compute service that runs code in response to events without provisioning or managing servers, automatically scaling from zero to thousands of concurrent executions.

  • Microservices

    Architectural style structuring an application as a collection of small, independent, deployable services, each with its own business logic and data.

  • From Prototype to Production: A Serverless Second Brain on AWS

    Architecture design for scaling a personal second brain to a production system with AWS serverless — from the current prototype to specialized use cases in legal, research, and community building.

  • Serverless Second Brain

    Production-ready serverless backend for a personal knowledge graph — DynamoDB, Lambda, Bedrock, MCP, Step Functions. The implementation of the architecture described in the 'From Prototype to Production' essay.

Concepts