AWS Bedrock

What it is

Amazon Bedrock is a serverless service providing access to foundation models from multiple providers through a unified API. No infrastructure to manage — just call the API and pay per tokens consumed.

The service abstracts the complexity of deploying and scaling AI models on GPU infrastructure, allowing engineering teams to integrate generative AI capabilities without machine learning operations expertise. Bedrock automatically handles scaling, availability, and model updates.

Unlike self-hosted solutions, Bedrock operates under AWS's shared responsibility model, where Amazon manages the underlying infrastructure, model maintenance, and physical security, while users maintain control over their data and access configurations.

Loading diagram...

Providers and models

Bedrock offers models from over 15 providers. Prices vary by region and change frequently — always check the official pricing page for current figures.

Provider	Key models	Strengths
Anthropic	Claude Sonnet 4, Claude Haiku 3.5	Code, complex analysis, agents
Amazon	Nova Pro, Nova Lite, Nova Micro	Cost-performance, multimodal
Meta	Llama 4, Llama 3.3 70B	Open-weight, fine-tuning, multilingual
Mistral	Mistral Large 3, Devstral 2	Reasoning, code, efficiency
DeepSeek	DeepSeek v3.2	Low-cost reasoning
Cohere	Rerank 3.5	Search and re-ranking

Converse API

The Converse API is the recommended interface for interacting with models in Bedrock. It provides a unified format that works with all models, eliminating the need for provider-specific message formatting:

import boto3
import json
 
bedrock_runtime = boto3.client('bedrock-runtime')
 
# Unified invocation — works with any model
response = bedrock_runtime.converse(
    modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
    messages=[
        {
            'role': 'user',
            'content': [{'text': 'What are the advantages of event-driven architectures?'}]
        }
    ],
    inferenceConfig={
        'maxTokens': 1000,
        'temperature': 0.7
    }
)
 
# Structured response
output = response['output']['message']['content'][0]['text']
usage = response['usage']  # inputTokens, outputTokens
print(f"Tokens: {usage['inputTokens']} in, {usage['outputTokens']} out")

For real-time streaming, use converse_stream:

response = bedrock_runtime.converse_stream(
    modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
    messages=[
        {'role': 'user', 'content': [{'text': 'Explain event sourcing'}]}
    ],
    inferenceConfig={'maxTokens': 500}
)
 
for event in response['stream']:
    if 'contentBlockDelta' in event:
        print(event['contentBlockDelta']['delta']['text'], end='')

Bedrock Agents

AI agents in Bedrock enable creating autonomous systems that use tools (Lambda functions) and query knowledge bases:

bedrock_agent = boto3.client('bedrock-agent')
 
agent_response = bedrock_agent.create_agent(
    agentName='support-agent',
    foundationModel='anthropic.claude-sonnet-4-20250514-v1:0',
    instruction='''You are a technical support agent.
    Use available tools to:
    1. Query order status
    2. Search technical documentation
    3. Create support tickets''',
    actionGroups=[
        {
            'actionGroupName': 'order-tools',
            'actionGroupExecutor': {
                'lambda': 'arn:aws:lambda:us-east-1:<account-id>:function:order-lookup'
            },
            'apiSchema': {
                's3': {
                    's3BucketName': 'agent-schemas',
                    's3ObjectKey': 'order-api-schema.json'
                }
            }
        }
    ]
)

Since March 2025, Bedrock supports multi-agent collaboration, where a supervisor agent coordinates specialized agents for complex workflows.

Knowledge Bases and RAG

Knowledge Bases implement managed RAG, automatically syncing with data sources in S3, Confluence, SharePoint, or web crawlers:

kb_response = bedrock_agent.create_knowledge_base(
    name='technical-docs-kb',
    description='Company technical documentation',
    roleArn='arn:aws:iam::<account-id>:role/BedrockKBRole',
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0'
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'arn:aws:aoss:us-east-1:<account-id>:collection/kb-collection',
            'vectorIndexName': 'bedrock-kb-index',
            'fieldMapping': {
                'vectorField': 'bedrock-knowledge-base-default-vector',
                'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
                'metadataField': 'AMAZON_BEDROCK_METADATA'
            }
        }
    }
)
 
# Configure data source with chunking
data_source = bedrock_agent.create_data_source(
    knowledgeBaseId=kb_response['knowledgeBase']['knowledgeBaseId'],
    name='s3-docs',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::my-docs-bucket',
            'inclusionPrefixes': ['technical-docs/']
        }
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 512,
                'overlapPercentage': 20
            }
        }
    }
)

Guardrails and security

Bedrock Guardrails provides content filtering and AI safety applicable to any model:

bedrock = boto3.client('bedrock')
 
guardrail = bedrock.create_guardrail(
    name='enterprise-guardrail',
    description='Filters for enterprise content',
    topicPolicyConfig={
        'topicsConfig': [
            {
                'name': 'Financial Advice',
                'definition': 'Avoid specific financial advice',
                'examples': ['Should I invest in stocks?'],
                'type': 'DENY'
            }
        ]
    },
    contentPolicyConfig={
        'filtersConfig': [
            {'type': 'SEXUAL', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
            {'type': 'VIOLENCE', 'inputStrength': 'MEDIUM', 'outputStrength': 'MEDIUM'}
        ]
    },
    sensitiveInformationPolicyConfig={
        'piiEntitiesConfig': [
            {'type': 'EMAIL', 'action': 'BLOCK'},
            {'type': 'PHONE', 'action': 'ANONYMIZE'}
        ]
    }
)
 
# Apply guardrails with Converse API
response = bedrock_runtime.converse(
    modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
    messages=[{'role': 'user', 'content': [{'text': 'User question'}]}],
    guardrailConfig={
        'guardrailIdentifier': guardrail['guardrailId'],
        'guardrailVersion': '1'
    },
    inferenceConfig={'maxTokens': 1000}
)

When to use Bedrock vs alternatives

Criteria	Bedrock	SageMaker	Direct API (Anthropic/OpenAI)
Infrastructure	Serverless, zero-ops	Requires endpoint configuration	Serverless
Models	Multi-provider	Any model (HuggingFace, custom)	Provider-only
Fine-tuning	Limited to supported models	Full, any framework	Varies by provider
Security	IAM, VPC, PrivateLink, Guardrails	IAM, VPC, private endpoints	API keys, limited
Latency	Low (same AWS region)	Configurable	Variable (internet)
Vendor lock-in	Medium (unified API, but AWS)	High (AWS infra)	Low (standard API)
Ideal case	AWS teams needing multi-model	Custom ML, own models	Rapid prototyping, single-provider

Cost optimization strategies

Cost optimization in Bedrock requires specific strategies:

Model selection by task: use small models (Nova Micro, Haiku) for classification and simple tasks; reserve premium models for complex analysis
Batch inference: asynchronous processing at 50% discount over on-demand pricing
Prompt caching: reuse long contexts across invocations to reduce input tokens
Intelligent Prompt Routing: automatic routing between models in the same family based on complexity ($1 per 1,000 requests, potential savings up to 30%)
CloudWatch monitoring: set alarms on InputTokenCount and OutputTokenCount to detect usage spikes

Why it matters

For staff+ engineering teams, Bedrock solves the AI adoption problem without technical debt. It eliminates the operational complexity of managing GPU infrastructure while maintaining control over data and security configurations through IAM, VPC, and PrivateLink.

The per-token pricing model enables elastic scaling without capacity commitments, crucial for unpredictable workloads. The unified Converse API allows switching between models without code refactoring, enabling continuous cost-performance optimization as models evolve.

Native integration with the AWS ecosystem — CloudWatch for observability, CloudTrail for auditing, IAM for granular access control — reduces attack surface and simplifies compliance in enterprise environments.

References

Amazon Bedrock User Guide — AWS, 2024. Complete service documentation.
Using the Converse API — AWS, 2024. Unified API guide for model invocation.
Bedrock Agents Developer Guide — AWS, 2024. Guide for creating AI agents with tools.
Bedrock Knowledge Bases — AWS, 2024. Managed RAG implementation.
Bedrock Guardrails — AWS, 2024. Security and PII filter configuration.
Amazon Bedrock Pricing — AWS, 2024. Per-model and per-region pricing.
Multi-agent collaboration in Amazon Bedrock — AWS, 2025. Multi-agent collaboration for complex workflows.
Anthropic Claude on Bedrock Best Practices — Anthropic, 2024. Claude-specific optimizations on Bedrock.

What it is

Loading diagram...

Providers and models

Bedrock offers models from over 15 providers. Prices vary by region and change frequently — always check the official pricing page for current figures.

Provider	Key models	Strengths
Anthropic	Claude Sonnet 4, Claude Haiku 3.5	Code, complex analysis, agents
Amazon	Nova Pro, Nova Lite, Nova Micro	Cost-performance, multimodal
Meta	Llama 4, Llama 3.3 70B	Open-weight, fine-tuning, multilingual
Mistral	Mistral Large 3, Devstral 2	Reasoning, code, efficiency
DeepSeek	DeepSeek v3.2	Low-cost reasoning
Cohere	Rerank 3.5	Search and re-ranking

Converse API

import boto3
import json
 
bedrock_runtime = boto3.client('bedrock-runtime')
 
# Unified invocation — works with any model
response = bedrock_runtime.converse(
    modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
    messages=[
        {
            'role': 'user',
            'content': [{'text': 'What are the advantages of event-driven architectures?'}]
        }
    ],
    inferenceConfig={
        'maxTokens': 1000,
        'temperature': 0.7
    }
)
 
# Structured response
output = response['output']['message']['content'][0]['text']
usage = response['usage']  # inputTokens, outputTokens
print(f"Tokens: {usage['inputTokens']} in, {usage['outputTokens']} out")

For real-time streaming, use converse_stream:

response = bedrock_runtime.converse_stream(
    modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
    messages=[
        {'role': 'user', 'content': [{'text': 'Explain event sourcing'}]}
    ],
    inferenceConfig={'maxTokens': 500}
)
 
for event in response['stream']:
    if 'contentBlockDelta' in event:
        print(event['contentBlockDelta']['delta']['text'], end='')

Bedrock Agents

AI agents in Bedrock enable creating autonomous systems that use tools (Lambda functions) and query knowledge bases:

bedrock_agent = boto3.client('bedrock-agent')
 
agent_response = bedrock_agent.create_agent(
    agentName='support-agent',
    foundationModel='anthropic.claude-sonnet-4-20250514-v1:0',
    instruction='''You are a technical support agent.
    Use available tools to:
    1. Query order status
    2. Search technical documentation
    3. Create support tickets''',
    actionGroups=[
        {
            'actionGroupName': 'order-tools',
            'actionGroupExecutor': {
                'lambda': 'arn:aws:lambda:us-east-1:<account-id>:function:order-lookup'
            },
            'apiSchema': {
                's3': {
                    's3BucketName': 'agent-schemas',
                    's3ObjectKey': 'order-api-schema.json'
                }
            }
        }
    ]
)

Since March 2025, Bedrock supports multi-agent collaboration, where a supervisor agent coordinates specialized agents for complex workflows.

Knowledge Bases and RAG

Knowledge Bases implement managed RAG, automatically syncing with data sources in S3, Confluence, SharePoint, or web crawlers:

kb_response = bedrock_agent.create_knowledge_base(
    name='technical-docs-kb',
    description='Company technical documentation',
    roleArn='arn:aws:iam::<account-id>:role/BedrockKBRole',
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0'
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'arn:aws:aoss:us-east-1:<account-id>:collection/kb-collection',
            'vectorIndexName': 'bedrock-kb-index',
            'fieldMapping': {
                'vectorField': 'bedrock-knowledge-base-default-vector',
                'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
                'metadataField': 'AMAZON_BEDROCK_METADATA'
            }
        }
    }
)
 
# Configure data source with chunking
data_source = bedrock_agent.create_data_source(
    knowledgeBaseId=kb_response['knowledgeBase']['knowledgeBaseId'],
    name='s3-docs',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::my-docs-bucket',
            'inclusionPrefixes': ['technical-docs/']
        }
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 512,
                'overlapPercentage': 20
            }
        }
    }
)

Guardrails and security

Bedrock Guardrails provides content filtering and AI safety applicable to any model:

bedrock = boto3.client('bedrock')
 
guardrail = bedrock.create_guardrail(
    name='enterprise-guardrail',
    description='Filters for enterprise content',
    topicPolicyConfig={
        'topicsConfig': [
            {
                'name': 'Financial Advice',
                'definition': 'Avoid specific financial advice',
                'examples': ['Should I invest in stocks?'],
                'type': 'DENY'
            }
        ]
    },
    contentPolicyConfig={
        'filtersConfig': [
            {'type': 'SEXUAL', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
            {'type': 'VIOLENCE', 'inputStrength': 'MEDIUM', 'outputStrength': 'MEDIUM'}
        ]
    },
    sensitiveInformationPolicyConfig={
        'piiEntitiesConfig': [
            {'type': 'EMAIL', 'action': 'BLOCK'},
            {'type': 'PHONE', 'action': 'ANONYMIZE'}
        ]
    }
)
 
# Apply guardrails with Converse API
response = bedrock_runtime.converse(
    modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
    messages=[{'role': 'user', 'content': [{'text': 'User question'}]}],
    guardrailConfig={
        'guardrailIdentifier': guardrail['guardrailId'],
        'guardrailVersion': '1'
    },
    inferenceConfig={'maxTokens': 1000}
)

When to use Bedrock vs alternatives

Criteria	Bedrock	SageMaker	Direct API (Anthropic/OpenAI)
Infrastructure	Serverless, zero-ops	Requires endpoint configuration	Serverless
Models	Multi-provider	Any model (HuggingFace, custom)	Provider-only
Fine-tuning	Limited to supported models	Full, any framework	Varies by provider
Security	IAM, VPC, PrivateLink, Guardrails	IAM, VPC, private endpoints	API keys, limited
Latency	Low (same AWS region)	Configurable	Variable (internet)
Vendor lock-in	Medium (unified API, but AWS)	High (AWS infra)	Low (standard API)
Ideal case	AWS teams needing multi-model	Custom ML, own models	Rapid prototyping, single-provider

Cost optimization strategies

Cost optimization in Bedrock requires specific strategies:

Model selection by task: use small models (Nova Micro, Haiku) for classification and simple tasks; reserve premium models for complex analysis
Batch inference: asynchronous processing at 50% discount over on-demand pricing
Prompt caching: reuse long contexts across invocations to reduce input tokens
Intelligent Prompt Routing: automatic routing between models in the same family based on complexity ($1 per 1,000 requests, potential savings up to 30%)
CloudWatch monitoring: set alarms on InputTokenCount and OutputTokenCount to detect usage spikes

Why it matters

References

Amazon Bedrock User Guide — AWS, 2024. Complete service documentation.
Using the Converse API — AWS, 2024. Unified API guide for model invocation.
Bedrock Agents Developer Guide — AWS, 2024. Guide for creating AI agents with tools.
Bedrock Knowledge Bases — AWS, 2024. Managed RAG implementation.
Bedrock Guardrails — AWS, 2024. Security and PII filter configuration.
Amazon Bedrock Pricing — AWS, 2024. Per-model and per-region pricing.
Multi-agent collaboration in Amazon Bedrock — AWS, 2025. Multi-agent collaboration for complex workflows.
Anthropic Claude on Bedrock Best Practices — Anthropic, 2024. Claude-specific optimizations on Bedrock.

AWS Bedrock

What it is

Providers and models

Converse API

Bedrock Agents

Knowledge Bases and RAG

Guardrails and security

When to use Bedrock vs alternatives

Cost optimization strategies

Why it matters

References

Related content

AWS Bedrock

What it is

Providers and models

Converse API

Bedrock Agents

Knowledge Bases and RAG

Guardrails and security

When to use Bedrock vs alternatives

Cost optimization strategies

Why it matters

References

Related content