Content Agent QA Review: PR #187

What happened

The content agent generated PR #187 to upgrade ai-coding-assistants from seed to evergreen. Manual review found three factual errors that the automated pipeline did not catch.

Findings

1. Hallucinated reference

The agent cited "The Programmer's Brain in the Era of AI" with URL research.google/pubs/pub52966/. That URL exists (HTTP 200), but points to a medical NLP paper titled "Structured Understanding of Assessment and Plans in Clinical Documentation" (Yaya-Stupp et al., medRxiv 2022). The title, year, and topic were fabricated.

Why it wasn't caught: the QA agent verifies that URLs return HTTP 200, but does not verify that the page content matches the cited title. A 200 does not mean the reference is correct.

2. Unsourced statistic

The "Why it matters" section claimed "productivity increases of 20-40%" without citing a specific study. The GitHub paper (Peng et al., 2023) measured 55.8% on a specific task. Google's internal study measured ~21%. No cited source supports the "20-40%" range.

Why it wasn't caught: the QA agent with --deep looks for "unsourced claims" but does not cross-check numbers in the text against actual figures in the cited references.

3. Incorrect pricing

The comparison table listed Kiro at $25/month. The actual price is $20/month (Pro) per kiro.dev/pricing. Prices change — the agent likely used stale training data.

Why it wasn't caught: there is no pricing verification in the pipeline. Prices are volatile data that the LLM cannot verify without real-time web access.

What worked well

ES↔EN structure was correct and symmetric
Cross-references expanded from 2 to 5 concepts, all valid
Internal links use /concepts/slug paths correctly
Comparison table with real tools is substantive
Security considerations section adds genuine depth
All 4 legitimate URLs returned HTTP 200

What needs to improve

Short term

Improvement	Effort	Impact
Verify reference title appears on the page	Medium — requires fetch + text search	High — eliminates reference hallucinations
Cross-check text figures against cited references	High — requires semantic understanding	High — eliminates fabricated statistics
Add prompt warning about prices and volatile data	Low — prompt change	Medium — reduces stale data errors

Medium term

Semantic reference verification: after verifying HTTP 200, fetch the page <title> and compare against the cited title. If they don't match, flag as suspicious.
Quantitative claim validation: extract numbers from text and verify that at least one reference supports them. This requires an additional LLM step or an extraction heuristic.
Volatile data: maintain a verified data file (prices, versions, release dates) that the agent consults instead of relying on training data.

Corrections applied

Kiro pricing: $25 → $20 Pro
Hallucinated reference replaced with Peng et al., 2023 (actual Copilot experiment paper)
Weak SO reference replaced with Google Research ML code completion
"20-40%" figure replaced with 55.8% citing Peng et al.

References

PR #187: upgrade concepts/ai-coding-assistants to evergreen — jonmatum/jonmatum.com, 2026. The reviewed PR.
The Impact of AI on Developer Productivity: Evidence from GitHub Copilot — Peng et al., 2023. Controlled experiment measuring 55.8% improvement.
ML-Enhanced Code Completion Improves Developer Productivity — Google Research, 2022. Internal evaluation of ML-powered code completion.

Findings

1. Hallucinated reference

Why it wasn't caught: the QA agent verifies that URLs return HTTP 200, but does not verify that the page content matches the cited title. A 200 does not mean the reference is correct.

2. Unsourced statistic

Why it wasn't caught: the QA agent with --deep looks for "unsourced claims" but does not cross-check numbers in the text against actual figures in the cited references.

3. Incorrect pricing

The comparison table listed Kiro at $25/month. The actual price is $20/month (Pro) per kiro.dev/pricing. Prices change — the agent likely used stale training data.

Why it wasn't caught: there is no pricing verification in the pipeline. Prices are volatile data that the LLM cannot verify without real-time web access.

What needs to improve

Short term

Improvement	Effort	Impact
Verify reference title appears on the page	Medium — requires fetch + text search	High — eliminates reference hallucinations
Cross-check text figures against cited references	High — requires semantic understanding	High — eliminates fabricated statistics
Add prompt warning about prices and volatile data	Low — prompt change	Medium — reduces stale data errors

Medium term

Semantic reference verification: after verifying HTTP 200, fetch the page <title> and compare against the cited title. If they don't match, flag as suspicious.

Quantitative claim validation: extract numbers from text and verify that at least one reference supports them. This requires an additional LLM step or an extraction heuristic.

Volatile data: maintain a verified data file (prices, versions, release dates) that the agent consults instead of relying on training data.

Content Agent QA Review: PR #187

What happened

Findings

1. Hallucinated reference

2. Unsourced statistic

3. Incorrect pricing

What worked well

What needs to improve

Short term

Medium term

Corrections applied

References

Related content

Content Agent QA Review: PR #187

What happened

Findings

1. Hallucinated reference

2. Unsourced statistic

3. Incorrect pricing

What worked well

What needs to improve

Short term

Medium term

Corrections applied

References

Related content