Jonatan Matajonmatum.com
conceptsnotesexperimentsessays
© 2026 Jonatan Mata. All rights reserved.v2.1.1
Concepts

AWS Bedrock

AWS serverless service providing access to foundation models from multiple providers (Anthropic, Meta, Mistral, Amazon) via unified API, without managing ML infrastructure.

evergreen#aws#bedrock#llm#ai#foundation-models#serverless

What it is

Amazon Bedrock is a serverless service providing access to foundation models from multiple providers through a unified API. No infrastructure to manage — just call the API and pay per tokens consumed.

The service abstracts the complexity of deploying and scaling AI models on GPU infrastructure, allowing engineering teams to integrate generative AI capabilities without machine learning operations expertise. Bedrock automatically handles scaling, availability, and model updates.

Unlike self-hosted solutions, Bedrock operates under AWS's shared responsibility model, where Amazon manages the underlying infrastructure, model maintenance, and physical security, while users maintain control over their data and access configurations.

Loading diagram...

Providers and models

Bedrock offers models from over 15 providers. Prices vary by region and change frequently — always check the official pricing page for current figures.

ProviderKey modelsStrengths
AnthropicClaude Sonnet 4, Claude Haiku 3.5Code, complex analysis, agents
AmazonNova Pro, Nova Lite, Nova MicroCost-performance, multimodal
MetaLlama 4, Llama 3.3 70BOpen-weight, fine-tuning, multilingual
MistralMistral Large 3, Devstral 2Reasoning, code, efficiency
DeepSeekDeepSeek v3.2Low-cost reasoning
CohereRerank 3.5Search and re-ranking

Converse API

The Converse API is the recommended interface for interacting with models in Bedrock. It provides a unified format that works with all models, eliminating the need for provider-specific message formatting:

import boto3
import json
 
bedrock_runtime = boto3.client('bedrock-runtime')
 
# Unified invocation — works with any model
response = bedrock_runtime.converse(
    modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
    messages=[
        {
            'role': 'user',
            'content': [{'text': 'What are the advantages of event-driven architectures?'}]
        }
    ],
    inferenceConfig={
        'maxTokens': 1000,
        'temperature': 0.7
    }
)
 
# Structured response
output = response['output']['message']['content'][0]['text']
usage = response['usage']  # inputTokens, outputTokens
print(f"Tokens: {usage['inputTokens']} in, {usage['outputTokens']} out")

For real-time streaming, use converse_stream:

response = bedrock_runtime.converse_stream(
    modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
    messages=[
        {'role': 'user', 'content': [{'text': 'Explain event sourcing'}]}
    ],
    inferenceConfig={'maxTokens': 500}
)
 
for event in response['stream']:
    if 'contentBlockDelta' in event:
        print(event['contentBlockDelta']['delta']['text'], end='')

Bedrock Agents

AI agents in Bedrock enable creating autonomous systems that use tools (Lambda functions) and query knowledge bases:

bedrock_agent = boto3.client('bedrock-agent')
 
agent_response = bedrock_agent.create_agent(
    agentName='support-agent',
    foundationModel='anthropic.claude-sonnet-4-20250514-v1:0',
    instruction='''You are a technical support agent.
    Use available tools to:
    1. Query order status
    2. Search technical documentation
    3. Create support tickets''',
    actionGroups=[
        {
            'actionGroupName': 'order-tools',
            'actionGroupExecutor': {
                'lambda': 'arn:aws:lambda:us-east-1:<account-id>:function:order-lookup'
            },
            'apiSchema': {
                's3': {
                    's3BucketName': 'agent-schemas',
                    's3ObjectKey': 'order-api-schema.json'
                }
            }
        }
    ]
)

Since March 2025, Bedrock supports multi-agent collaboration, where a supervisor agent coordinates specialized agents for complex workflows.

Knowledge Bases and RAG

Knowledge Bases implement managed RAG, automatically syncing with data sources in S3, Confluence, SharePoint, or web crawlers:

kb_response = bedrock_agent.create_knowledge_base(
    name='technical-docs-kb',
    description='Company technical documentation',
    roleArn='arn:aws:iam::<account-id>:role/BedrockKBRole',
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0'
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'arn:aws:aoss:us-east-1:<account-id>:collection/kb-collection',
            'vectorIndexName': 'bedrock-kb-index',
            'fieldMapping': {
                'vectorField': 'bedrock-knowledge-base-default-vector',
                'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
                'metadataField': 'AMAZON_BEDROCK_METADATA'
            }
        }
    }
)
 
# Configure data source with chunking
data_source = bedrock_agent.create_data_source(
    knowledgeBaseId=kb_response['knowledgeBase']['knowledgeBaseId'],
    name='s3-docs',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::my-docs-bucket',
            'inclusionPrefixes': ['technical-docs/']
        }
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 512,
                'overlapPercentage': 20
            }
        }
    }
)

Guardrails and security

Bedrock Guardrails provides content filtering and AI safety applicable to any model:

bedrock = boto3.client('bedrock')
 
guardrail = bedrock.create_guardrail(
    name='enterprise-guardrail',
    description='Filters for enterprise content',
    topicPolicyConfig={
        'topicsConfig': [
            {
                'name': 'Financial Advice',
                'definition': 'Avoid specific financial advice',
                'examples': ['Should I invest in stocks?'],
                'type': 'DENY'
            }
        ]
    },
    contentPolicyConfig={
        'filtersConfig': [
            {'type': 'SEXUAL', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
            {'type': 'VIOLENCE', 'inputStrength': 'MEDIUM', 'outputStrength': 'MEDIUM'}
        ]
    },
    sensitiveInformationPolicyConfig={
        'piiEntitiesConfig': [
            {'type': 'EMAIL', 'action': 'BLOCK'},
            {'type': 'PHONE', 'action': 'ANONYMIZE'}
        ]
    }
)
 
# Apply guardrails with Converse API
response = bedrock_runtime.converse(
    modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
    messages=[{'role': 'user', 'content': [{'text': 'User question'}]}],
    guardrailConfig={
        'guardrailIdentifier': guardrail['guardrailId'],
        'guardrailVersion': '1'
    },
    inferenceConfig={'maxTokens': 1000}
)

When to use Bedrock vs alternatives

CriteriaBedrockSageMakerDirect API (Anthropic/OpenAI)
InfrastructureServerless, zero-opsRequires endpoint configurationServerless
ModelsMulti-providerAny model (HuggingFace, custom)Provider-only
Fine-tuningLimited to supported modelsFull, any frameworkVaries by provider
SecurityIAM, VPC, PrivateLink, GuardrailsIAM, VPC, private endpointsAPI keys, limited
LatencyLow (same AWS region)ConfigurableVariable (internet)
Vendor lock-inMedium (unified API, but AWS)High (AWS infra)Low (standard API)
Ideal caseAWS teams needing multi-modelCustom ML, own modelsRapid prototyping, single-provider

Cost optimization strategies

Cost optimization in Bedrock requires specific strategies:

  • Model selection by task: use small models (Nova Micro, Haiku) for classification and simple tasks; reserve premium models for complex analysis
  • Batch inference: asynchronous processing at 50% discount over on-demand pricing
  • Prompt caching: reuse long contexts across invocations to reduce input tokens
  • Intelligent Prompt Routing: automatic routing between models in the same family based on complexity ($1 per 1,000 requests, potential savings up to 30%)
  • CloudWatch monitoring: set alarms on InputTokenCount and OutputTokenCount to detect usage spikes

Why it matters

For staff+ engineering teams, Bedrock solves the AI adoption problem without technical debt. It eliminates the operational complexity of managing GPU infrastructure while maintaining control over data and security configurations through IAM, VPC, and PrivateLink.

The per-token pricing model enables elastic scaling without capacity commitments, crucial for unpredictable workloads. The unified Converse API allows switching between models without code refactoring, enabling continuous cost-performance optimization as models evolve.

Native integration with the AWS ecosystem — CloudWatch for observability, CloudTrail for auditing, IAM for granular access control — reduces attack surface and simplifies compliance in enterprise environments.

References

  • Amazon Bedrock User Guide — AWS, 2024. Complete service documentation.
  • Using the Converse API — AWS, 2024. Unified API guide for model invocation.
  • Bedrock Agents Developer Guide — AWS, 2024. Guide for creating AI agents with tools.
  • Bedrock Knowledge Bases — AWS, 2024. Managed RAG implementation.
  • Bedrock Guardrails — AWS, 2024. Security and PII filter configuration.
  • Amazon Bedrock Pricing — AWS, 2024. Per-model and per-region pricing.
  • Multi-agent collaboration in Amazon Bedrock — AWS, 2025. Multi-agent collaboration for complex workflows.
  • Anthropic Claude on Bedrock Best Practices — Anthropic, 2024. Claude-specific optimizations on Bedrock.

Related content

  • Large Language Models

    Massive neural networks based on the Transformer architecture, trained on enormous text corpora to understand and generate natural language with emergent capabilities like reasoning, translation, and code generation.

  • AI Agents

    Autonomous systems that combine language models with reasoning, memory, and tool use to execute complex multi-step tasks with minimal human intervention.

  • Serverless

    Cloud computing model where the provider manages infrastructure automatically, allowing code execution without provisioning or managing servers, paying only for actual usage.

  • Cost Optimization

    Practices and strategies to minimize cloud spending without sacrificing performance, including right-sizing, reservations, spot instances, and eliminating idle resources.

  • AI Safety

    Field dedicated to ensuring artificial intelligence systems behave safely, aligned with human values, and predictably, minimizing risks of harm.

  • Retrieval-Augmented Generation

    Architectural pattern that combines information retrieval from external sources with LLM text generation, reducing hallucinations and keeping knowledge current without retraining the model.

  • From Prototype to Production: A Serverless Second Brain on AWS

    Architecture design for scaling a personal second brain to a production system with AWS serverless — from the current prototype to specialized use cases in legal, research, and community building.

  • Building a Second Brain in Public

    Chronicle of building a second brain with a knowledge graph, bilingual pipeline, and agent endpoints — in days, not weeks, and what that teaches about the gap between theory and working systems.

  • Serverless Second Brain

    Production-ready serverless backend for a personal knowledge graph — DynamoDB, Lambda, Bedrock, MCP, Step Functions. The implementation of the architecture described in the 'From Prototype to Production' essay.

Concepts