Chronicle of building a second brain with a knowledge graph, bilingual pipeline, and agent endpoints — over a weekend, at near-zero cost, and what that teaches about the gap between theory and working systems.
Over a weekend in March 2026, I built a second brain. Not the kind that lives in a note-taking app and collects dust — one with a knowledge graph of 154 nodes and 369 edges, a bilingual Spanish-English pipeline, and endpoints that any AI agent can query.
The result is jonmatum.com: 131 concepts, 20 experiments, 3 notes, ~131,000 words in two languages, an interactive D3 graph, and an agent-friendly surface with /llms.txt, /api/knowledge, /api/graph, and /api/mcp. Monthly cost: $0.
The inspiration came from Nate B Jones' series on AI-powered second brains. His thesis: the second brain is no longer a passive storage system — it's an active system that works while you sleep. I watched the video, read the posts, and instead of taking notes about how to build one, I just built it.
Jones defines 8 building blocks for a functional second brain. Here's how they map to what I built:
| Block | Jones | jonmatum.com | Status |
|---|---|---|---|
| 1. Dropbox | Slack channel, zero friction | MDX files in content/ | Partial — high friction (create file, frontmatter, commit) |
| 2. Sorter | AI agent auto-classifies | Manual type: field in frontmatter | Partial — manual |
| 3. Form | Strict schema per type | Frontmatter validated by pnpm validate | Complete |
| 4. Filing Cabinet | Notion databases | content/{concepts,notes,experiments,essays}/ | Complete |
| 5. Receipt | Inbox Log documents every action | Git history (171 commits) | Complete — git is the audit trail |
| 6. Bouncer | Confidence score 0-1 | Validation + lint pipeline | Partial — build-time only |
| 7. Tap on the Shoulder | Daily/weekly digests | — | Missing |
| 8. Fix Button | Chat commands to reclassify | Manual file edits | Partial — manual |
Blocks 3-5 are solid. Block 7 — proactive surfacing — is completely missing. And it's, according to Jones, the differentiator between a storage system and a real second brain.
Jones insists on separating memory, compute, and interface. Without having read that principle before starting, the monorepo architecture already implements it:
content/ — plain text, portable, no vendor lock-inpackages/knowledge/ — parser, graph builder, llms.txt generator, embeddingsapps/web/ — the human doorThis separation allowed changing the interface without touching content, and vice versa. When semantic search failed in production, I could revert to keyword search without losing a single graph node.
Jones describes a "shared surface with two doors": the agent enters through one, the human through the other. Both access the same data.
At jonmatum.com:
/llms.txt (634 lines), /llms-full.txt (11,568 lines), /api/knowledge, /api/graph, /api/concepts/[id], /api/mcpWhat's missing: the agent door is read-only. Jones proposes both doors read and write. Today, an agent can query my entire knowledge graph but can't add a node. That's the biggest gap with Open Brain.
One of the principles that emerged from Jones' community: "architecture matters more than tools." One member swapped every recommended tool and got the same results.
jonmatum.com validates this. The stack is deliberately generic: MDX (plain text), JSON (generated data), Next.js (interface). There's no dependency on Notion, Obsidian, or any PKM SaaS. If tomorrow I want to migrate the interface to Astro or the storage to Postgres, the MDX content remains the same.
Every concept has a status: seed | growing | evergreen field. A seed is a 150-word stub with valid frontmatter. It doesn't need to be complete to exist in the graph — it just needs to be correct.
This unlocked speed: instead of writing 131 perfect concepts, I wrote 131 seeds and let them grow. The graph benefits from coverage as much as from depth.
I attempted semantic search with @huggingface/transformers (Xenova/all-MiniLM-L6-v2) running via WebAssembly in the browser. It failed for three reasons:
tolist(), .data, and output[0] behave differently between Node.js and browser WASMI reverted to instant keyword search. Pre-computed embeddings are still generated at build time for future use. The viable solution is a server-side API route — ~$0.10/month at current scale.
This is a concrete example of the implementation gap Jones describes: technology that should work in theory but breaks in practice.
Jones' block 1 — the Dropbox — requires zero friction. Capturing a thought in jonmatum.com requires: creating an MDX file, writing frontmatter with 11 mandatory fields, writing content in Spanish, creating the .en.mdx file with the translation, and committing. That's not zero friction — it's a 15-30 minute process per concept.
The speed I achieved (154 items in 5 days) was possible because I used AI agents to assist generation. Without that assistance, the pace would be unsustainable. Low-friction capture is the next problem to solve.
McKinsey reports that 88% of organizations use AI in at least one business function, but only 1% consider their deployments mature. Jones frames this as the implementation gap: the world splits between the "hype machine" and the "feature comparison industrial complex." What's missing is the bridge.
Building jonmatum.com over a weekend taught me where that bridge exists — and where it doesn't:
The bridge exists when architecture is simple and tools are mature. Turborepo, Next.js, MDX, Tailwind — none of this was an obstacle. Infrastructure was built in hours, not days.
The bridge doesn't exist when technology promises more than it delivers in production. Client-side semantic search is the perfect example: works in demos, fails in deployment. The gap isn't knowledge — it's practical integration.
The bridge also doesn't exist for content. 131,000 words in two languages don't write themselves. AI assisted, but editorial judgment — what to include, how to connect it, what level of depth — remains human. The second brain automates the system, not the thinking.
My native language is Spanish. Writing in Spanish is natural — ideas flow without the friction of mentally translating. But I wanted to share knowledge with my team and community, and English is the universal language of technology. At the same time, I wanted my friends and family who don't speak English to access the content.
The solution: Spanish as the primary language (the default site), English as translation (available at /en/...). Every concept has .mdx (Spanish) and .en.mdx (English). Frontmatter includes title_es/title_en and summary_es/summary_en.
The cost: 73,963 words in Spanish + 57,409 in English = ~78% more content than a monolingual system. But the value is disproportionate:
/llms.txt in English — the language LLMs perform best inI found no other public second brains that are bilingual. The PKM ecosystem is overwhelmingly monolingual (English). This is an opportunity, not a problem.
Is it sustainable at scale? With AI-assisted translation + human review, yes. Without AI, no. At 500+ items, manual translation would be a bottleneck. But that's a problem to solve when it arrives — not a reason not to start.
The system is at 75%. It has MDX capture, type-based organization, a knowledge graph with 154 nodes and 369 edges, bilingual summaries, keyword search, agent-friendly endpoints (/llms.txt, /api/*), D3 graph visualization, full i18n, and Vercel deployment. What's missing:
Current search is keyword-based — it works, but doesn't understand meaning. The plan: an API route /api/search that loads pre-computed embeddings (already exist in embeddings.json), computes the query embedding with OpenAI text-embedding-3-small, and runs cosine similarity. At 155 nodes I don't need a vector database — everything fits in memory. Estimated cost: ~$0.10/month.
Being able to ask the second brain: "what do I know about infrastructure as code?" and get an answer with context from my own concepts. The architecture: a Strands agent that uses embeddings for retrieval-augmented generation and Claude Sonnet via Bedrock for generation. API route /api/chat with streaming. Estimated cost: ~$1-3/month depending on usage.
The graph already has the edges — if concept A references concept B, the edge exists. What's missing is inverting it in the UI: showing on each page "these concepts reference you." It's a getBacklinks(slug) helper and a component. Cost: $0.
Today the only way to capture is creating an MDX file manually. Next step: an API route /api/capture that accepts text + source URL, stores it as MDX in captures/, and optionally generates embeddings. This opens the door to a bookmarklet, browser extension, or Readwise integration.
Jones' block 7 — the one that's completely missing. The data to implement it already exists: created/updated dates on every item, the graph with edges to find related content, and status: seed as a priority signal. The simplest implementation: a homepage widget that uses localStorage to track which concepts you visited and when, and suggests forgotten seeds or concepts you haven't reviewed in weeks.
The most interesting step. jonmatum.com already exposes /api/mcp — the endpoint exists. A full MCP server would expose tools like search_knowledge, get_concept, list_related that any AI client could invoke. The protocol was donated to the Linux Foundation in 2026, has over 1,000 servers in production, and is supported by Claude, ChatGPT, Cursor, and virtually every relevant AI client. This would turn the agent door from read-only to read-write — closing the gap with Open Brain.
| Capability | Monthly cost |
|---|---|
| Semantic search | ~$0.10 |
| AI chat | ~$1-3 |
| Backlinks, capture, surfacing, MCP | $0 |
| Total | ~$1-4/month |
Everything fits within Vercel's Pro plan limits. The evolution path from static MDX to Postgres + MCP exists when scale demands it. For now, the cost is ~$0 and the architecture is portable. The second brain doesn't need to be perfect — it needs to exist.
Data structures representing knowledge as networks of entities and relationships, enabling reasoning, connection discovery, and semantic queries over complex domains.
Open protocol created by Anthropic that standardizes how AI applications connect with external tools, data, and services through a universal interface.
Field of computer science dedicated to creating systems capable of performing tasks that normally require human intelligence, from reasoning and perception to language generation.
Autonomous systems that combine language models with reasoning, memory, and tool use to execute complex multi-step tasks with minimal human intervention.
Proposed standard for publishing a Markdown file at a website's root that enables language models to efficiently understand and use the site's content at inference time.
Information retrieval technique that uses vector embeddings to find results by meaning, not just exact keyword matching.
Dense vector representations that capture the semantic meaning of text, images, or other data in a numerical space where proximity reflects conceptual similarity.
Architectural pattern that combines information retrieval from external sources with LLM text generation, reducing hallucinations and keeping knowledge current without retraining the model.
Open source SDK from AWS for building AI agents with a model-driven approach. Functional agents in a few lines of code, with multi-model support, custom tools, MCP, multi-agent, and built-in observability.
AWS managed service providing access to foundation models from multiple providers (Anthropic, Meta, Mistral) via API, without managing ML infrastructure.
React framework for full-stack web applications with Server Components, file-based routing, SSR/SSG, and built-in performance optimizations.
Code organization strategy where multiple projects coexist in a single repository, sharing dependencies, configuration, and build tooling.
Utility-first CSS framework enabling design building directly in markup using atomic classes, eliminating the need to write custom CSS.
Storage systems specialized in indexing and searching high-dimensional vectors efficiently, enabling semantic search and RAG applications at scale.
Practice of defining and managing infrastructure through versioned configuration files instead of manual processes. Foundation of modern operations automation.