Three-agent system that automates the bilingual MDX content lifecycle: deterministic QA auditing, surgical fixes, and full upgrades — all orchestrated with Strands Agents, Claude Sonnet 4 on Amazon Bedrock, and GitHub Actions with a diamond workflow pattern.
A three-agent AI system that automates the full content lifecycle for a bilingual knowledge base (Spanish/English). The agents run in GitHub Actions, use Strands Agents as the orchestration framework, and Claude Sonnet 4 on Amazon Bedrock as the language model.
The system implements a continuous feedback loop: a QA agent audits content and opens issues → a fix agent applies surgical corrections → a content agent generates full upgrades → a human reviews and approves the PRs.
The QA agent (agents/qa_agent.py) runs structural checks without an LLM and optionally a deep review with Claude. It does not modify files — it only opens issues.
Structural checks (no LLM, no cost):
¿Por qué importa?)accTitle, accDescr)/concepts/slug).en.mdx filesDeep review (--deep, uses Bedrock):
# Local execution
python -m agents.qa_agent --dry-run --status evergreen # audit without creating issues
python -m agents.qa_agent --deep --slug serverless # LLM review of one concept
python -m agents.qa_agent --discover # JSON matrix for CI
python -m agents.qa_agent --single git # audit one + create issueThe fix agent (agents/qa_fix_agent.py) processes QA issues with minimal changes. It does not rewrite content — it only fixes what the issue describes.
Fix strategies by finding type:
| Finding | Strategy |
|---|---|
refs — missing references | Find primary source, add to ES + EN, verify URL |
ref_tiers — low diversity | Identify missing tier, add reference from that tier |
xrefs — few cross-refs | Read content, find related concepts, add to frontmatter |
broken_xref — broken ref | Remove non-existent slug or replace with valid one |
heading — English heading | Translate to Spanish keeping heading level |
ext_link — external link | Replace external URL with /concepts/slug in ES + EN |
missing_section — missing section | Add section with substantive content |
mermaid — no accessibility | Add accTitle: and accDescr: to diagram |
python -m agents.qa_fix_agent --issue 175 # fix a single QA issue
python -m agents.qa_fix_agent --batch 5 # fix 5 issues
python -m agents.qa_fix_agent --dry-run # test without LLMThe content agent (agents/content_agent.py) generates full rewrites to bring content from seed/growing to evergreen quality. It processes both upgrade: and qa: issues.
python -m agents.content_agent --issue 143 # process one issue
python -m agents.content_agent --batch 3 # process 3 issues
python -m agents.content_agent --dry-run # test without LLMAll three agents share four tools defined with the Strands @tool decorator:
from strands import Agent, tool
from strands.models import BedrockModel
@tool
def verify_url(url: str) -> str:
"""Verify a URL returns HTTP 200."""
r = httpx.head(url, follow_redirects=True, timeout=10)
return f"{url} → HTTP {r.status_code}"
@tool
def read_file(path: str) -> str:
"""Read a file from the repository."""
with open(os.path.join(os.environ["REPO_ROOT"], path)) as f:
return f.read()
@tool
def write_file(path: str, content: str) -> str:
"""Write content to a file."""
with open(os.path.join(os.environ["REPO_ROOT"], path), "w") as f:
f.write(content)
return f"Written: {path}"
@tool
def list_concept_files() -> str:
"""List existing concepts for cross-references."""
# returns available slugs for the frontmatter concepts: arrayAll three workflows use the same execution pattern — a "diamond" that discovers work, distributes it in parallel, and consolidates results:
Plan — discovers what to process (open issues or concepts with findings), generates a JSON matrix.
Matrix — each item runs in an isolated job. If one fails, the others continue (fail-fast: false).
Summary — downloads artifacts from all jobs, writes a summary to the GitHub job summary.
Workflows that use an LLM (content agent, QA fix agent) serialize matrix jobs (max-parallel: 1) to avoid Bedrock throttling. The structural QA agent — which does not use an LLM — keeps high parallelism (max-parallel: 5). The QA agent in deep mode serializes to 1.
# content-agent.yml — serialized to avoid throttling
strategy:
fail-fast: false
max-parallel: 1
matrix: ${{ fromJson(needs.plan.outputs.matrix) }}
# content-qa.yml — dynamic based on mode
strategy:
fail-fast: false
max-parallel: ${{ inputs.deep == true && 1 || 5 }}
matrix: ${{ fromJson(needs.plan.outputs.matrix) }}As a fallback, agents include a short retry with backoff (1 attempt, 10 seconds) that fails fast to avoid burning unnecessary CI minutes.
| Workflow | Plan | Work | Summary |
|---|---|---|---|
| Content Agent | 3 min | 15 min | 2 min |
| QA Fix Agent | 3 min | 10 min | 2 min |
| QA Audit | 3 min | 5 min | 2 min |
Authentication uses OIDC — GitHub Actions obtains an ephemeral JWT token and exchanges it for temporary AWS credentials:
resource "aws_iam_role" "content_agent" {
name = "jonmatum-content-agent"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Federated = aws_iam_openid_connect_provider.github.arn }
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
}
StringLike = {
"token.actions.githubusercontent.com:sub" = "repo:jonmatum/jonmatum.com:*"
}
}
}]
})
}
resource "aws_iam_role_policy" "bedrock_invoke" {
role = aws_iam_role.content_agent.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"]
Resource = [
"arn:aws:bedrock:*::foundation-model/anthropic.claude-sonnet-4-*",
"arn:aws:bedrock:*:*:inference-profile/us.anthropic.claude-sonnet-4-*"
]
}]
})
}No stored API keys, no credential rotation, no leak risk. The role only allows InvokeModel on Bedrock.
Data from system runs:
| Metric | Value |
|---|---|
| Concepts audited per QA cycle | 40 evergreen |
| Typical findings per audit | 5-8 concepts |
| Time per structural audit | Under 2 min (no LLM) |
| Time per deep review (LLM) | ~1 min per concept |
| Time per full upgrade | ~5 min per concept |
| Time per surgical fix | ~2 min per concept |
| Cost per upgrade (Sonnet 4) | ~$0.10-0.15 |
| Cost per surgical fix | ~$0.03-0.05 |
| Cost per deep review | ~$0.05 |
| Successful validation rate | ~85% (15% fail lint and are discarded) |
pnpm validate + pnpm lint:content after writing, the agent generates frontmatter with missing fields or content that fails the linter.max-parallel: 1 on LLM workflows eliminates Bedrock rate limit issues without spending CI minutes on retries.<50% or <10 in MDX content is parsed as JSX. Agents must use prose: "less than 50%", "over 10".@tool with docstrings auto-generates the schema for the model — no manual JSON Schema needed.This system demonstrates that a set of specialized agents with simple tools (file read/write, HTTP verification) can maintain a knowledge base autonomously. The pattern is replicable: any repository with structured content and documented quality rules can implement the same loop — audit, fix, upgrade, review. The human shifts from writer to editor: reviewing PRs instead of writing content from scratch.
@tool.Open source SDK from AWS for building AI agents with a model-driven approach. Functional agents in a few lines of code, with multi-model support, custom tools, MCP, multi-agent, and built-in observability.
Autonomous systems that combine language models with reasoning, memory, and tool use to execute complex multi-step tasks with minimal human intervention.
GitHub's native CI/CD platform. Declarative YAML workflows that automate build, test, deploy, and any development lifecycle task — directly from the repository.
Continuous Integration and Continuous Delivery/Deployment — practices that automate code integration, testing, and delivery to production. Foundation of modern software engineering.
Practice of defining and managing infrastructure through versioned configuration files instead of manual processes. Foundation of modern operations automation.
HashiCorp's Infrastructure as Code tool that enables defining, provisioning, and managing multi-cloud infrastructure through declarative HCL files.
Architectures where multiple specialized AI agents collaborate, compete, or coordinate to solve complex problems that exceed a single agent's capability.
Design patterns where AI agents execute complex multi-step tasks autonomously, combining reasoning, tool use, and iterative decision-making.