AWS serverless service providing access to foundation models from multiple providers (Anthropic, Meta, Mistral, Amazon) via unified API, without managing ML infrastructure.
Amazon Bedrock is a serverless service providing access to foundation models from multiple providers through a unified API. No infrastructure to manage — just call the API and pay per tokens consumed.
The service abstracts the complexity of deploying and scaling AI models on GPU infrastructure, allowing engineering teams to integrate generative AI capabilities without machine learning operations expertise. Bedrock automatically handles scaling, availability, and model updates.
Unlike self-hosted solutions, Bedrock operates under AWS's shared responsibility model, where Amazon manages the underlying infrastructure, model maintenance, and physical security, while users maintain control over their data and access configurations.
Bedrock offers models from over 15 providers. Prices vary by region and change frequently — always check the official pricing page for current figures.
| Provider | Key models | Strengths |
|---|---|---|
| Anthropic | Claude Sonnet 4, Claude Haiku 3.5 | Code, complex analysis, agents |
| Amazon | Nova Pro, Nova Lite, Nova Micro | Cost-performance, multimodal |
| Meta | Llama 4, Llama 3.3 70B | Open-weight, fine-tuning, multilingual |
| Mistral | Mistral Large 3, Devstral 2 | Reasoning, code, efficiency |
| DeepSeek | DeepSeek v3.2 | Low-cost reasoning |
| Cohere | Rerank 3.5 | Search and re-ranking |
The Converse API is the recommended interface for interacting with models in Bedrock. It provides a unified format that works with all models, eliminating the need for provider-specific message formatting:
import boto3
import json
bedrock_runtime = boto3.client('bedrock-runtime')
# Unified invocation — works with any model
response = bedrock_runtime.converse(
modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
messages=[
{
'role': 'user',
'content': [{'text': 'What are the advantages of event-driven architectures?'}]
}
],
inferenceConfig={
'maxTokens': 1000,
'temperature': 0.7
}
)
# Structured response
output = response['output']['message']['content'][0]['text']
usage = response['usage'] # inputTokens, outputTokens
print(f"Tokens: {usage['inputTokens']} in, {usage['outputTokens']} out")For real-time streaming, use converse_stream:
response = bedrock_runtime.converse_stream(
modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
messages=[
{'role': 'user', 'content': [{'text': 'Explain event sourcing'}]}
],
inferenceConfig={'maxTokens': 500}
)
for event in response['stream']:
if 'contentBlockDelta' in event:
print(event['contentBlockDelta']['delta']['text'], end='')AI agents in Bedrock enable creating autonomous systems that use tools (Lambda functions) and query knowledge bases:
bedrock_agent = boto3.client('bedrock-agent')
agent_response = bedrock_agent.create_agent(
agentName='support-agent',
foundationModel='anthropic.claude-sonnet-4-20250514-v1:0',
instruction='''You are a technical support agent.
Use available tools to:
1. Query order status
2. Search technical documentation
3. Create support tickets''',
actionGroups=[
{
'actionGroupName': 'order-tools',
'actionGroupExecutor': {
'lambda': 'arn:aws:lambda:us-east-1:<account-id>:function:order-lookup'
},
'apiSchema': {
's3': {
's3BucketName': 'agent-schemas',
's3ObjectKey': 'order-api-schema.json'
}
}
}
]
)Since March 2025, Bedrock supports multi-agent collaboration, where a supervisor agent coordinates specialized agents for complex workflows.
Knowledge Bases implement managed RAG, automatically syncing with data sources in S3, Confluence, SharePoint, or web crawlers:
kb_response = bedrock_agent.create_knowledge_base(
name='technical-docs-kb',
description='Company technical documentation',
roleArn='arn:aws:iam::<account-id>:role/BedrockKBRole',
knowledgeBaseConfiguration={
'type': 'VECTOR',
'vectorKnowledgeBaseConfiguration': {
'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0'
}
},
storageConfiguration={
'type': 'OPENSEARCH_SERVERLESS',
'opensearchServerlessConfiguration': {
'collectionArn': 'arn:aws:aoss:us-east-1:<account-id>:collection/kb-collection',
'vectorIndexName': 'bedrock-kb-index',
'fieldMapping': {
'vectorField': 'bedrock-knowledge-base-default-vector',
'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
'metadataField': 'AMAZON_BEDROCK_METADATA'
}
}
}
)
# Configure data source with chunking
data_source = bedrock_agent.create_data_source(
knowledgeBaseId=kb_response['knowledgeBase']['knowledgeBaseId'],
name='s3-docs',
dataSourceConfiguration={
'type': 'S3',
's3Configuration': {
'bucketArn': 'arn:aws:s3:::my-docs-bucket',
'inclusionPrefixes': ['technical-docs/']
}
},
vectorIngestionConfiguration={
'chunkingConfiguration': {
'chunkingStrategy': 'FIXED_SIZE',
'fixedSizeChunkingConfiguration': {
'maxTokens': 512,
'overlapPercentage': 20
}
}
}
)Bedrock Guardrails provides content filtering and AI safety applicable to any model:
bedrock = boto3.client('bedrock')
guardrail = bedrock.create_guardrail(
name='enterprise-guardrail',
description='Filters for enterprise content',
topicPolicyConfig={
'topicsConfig': [
{
'name': 'Financial Advice',
'definition': 'Avoid specific financial advice',
'examples': ['Should I invest in stocks?'],
'type': 'DENY'
}
]
},
contentPolicyConfig={
'filtersConfig': [
{'type': 'SEXUAL', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
{'type': 'VIOLENCE', 'inputStrength': 'MEDIUM', 'outputStrength': 'MEDIUM'}
]
},
sensitiveInformationPolicyConfig={
'piiEntitiesConfig': [
{'type': 'EMAIL', 'action': 'BLOCK'},
{'type': 'PHONE', 'action': 'ANONYMIZE'}
]
}
)
# Apply guardrails with Converse API
response = bedrock_runtime.converse(
modelId='us.anthropic.claude-sonnet-4-20250514-v1:0',
messages=[{'role': 'user', 'content': [{'text': 'User question'}]}],
guardrailConfig={
'guardrailIdentifier': guardrail['guardrailId'],
'guardrailVersion': '1'
},
inferenceConfig={'maxTokens': 1000}
)| Criteria | Bedrock | SageMaker | Direct API (Anthropic/OpenAI) |
|---|---|---|---|
| Infrastructure | Serverless, zero-ops | Requires endpoint configuration | Serverless |
| Models | Multi-provider | Any model (HuggingFace, custom) | Provider-only |
| Fine-tuning | Limited to supported models | Full, any framework | Varies by provider |
| Security | IAM, VPC, PrivateLink, Guardrails | IAM, VPC, private endpoints | API keys, limited |
| Latency | Low (same AWS region) | Configurable | Variable (internet) |
| Vendor lock-in | Medium (unified API, but AWS) | High (AWS infra) | Low (standard API) |
| Ideal case | AWS teams needing multi-model | Custom ML, own models | Rapid prototyping, single-provider |
Cost optimization in Bedrock requires specific strategies:
InputTokenCount and OutputTokenCount to detect usage spikesFor staff+ engineering teams, Bedrock solves the AI adoption problem without technical debt. It eliminates the operational complexity of managing GPU infrastructure while maintaining control over data and security configurations through IAM, VPC, and PrivateLink.
The per-token pricing model enables elastic scaling without capacity commitments, crucial for unpredictable workloads. The unified Converse API allows switching between models without code refactoring, enabling continuous cost-performance optimization as models evolve.
Native integration with the AWS ecosystem — CloudWatch for observability, CloudTrail for auditing, IAM for granular access control — reduces attack surface and simplifies compliance in enterprise environments.
Massive neural networks based on the Transformer architecture, trained on enormous text corpora to understand and generate natural language with emergent capabilities like reasoning, translation, and code generation.
Autonomous systems that combine language models with reasoning, memory, and tool use to execute complex multi-step tasks with minimal human intervention.
Cloud computing model where the provider manages infrastructure automatically, allowing code execution without provisioning or managing servers, paying only for actual usage.
Practices and strategies to minimize cloud spending without sacrificing performance, including right-sizing, reservations, spot instances, and eliminating idle resources.
Field dedicated to ensuring artificial intelligence systems behave safely, aligned with human values, and predictably, minimizing risks of harm.
Architectural pattern that combines information retrieval from external sources with LLM text generation, reducing hallucinations and keeping knowledge current without retraining the model.
Architecture design for scaling a personal second brain to a production system with AWS serverless — from the current prototype to specialized use cases in legal, research, and community building.
Chronicle of building a second brain with a knowledge graph, bilingual pipeline, and agent endpoints — in days, not weeks, and what that teaches about the gap between theory and working systems.
Production-ready serverless backend for a personal knowledge graph — DynamoDB, Lambda, Bedrock, MCP, Step Functions. The implementation of the architecture described in the 'From Prototype to Production' essay.