jonmatumalpha
conceptsnotesexperimentsessays

© 2026 Jonatan Mata · alpha · v0.1.0

Essays

Building a Second Brain in Public

Chronicle of building a second brain with a knowledge graph, bilingual pipeline, and agent endpoints — over a weekend, at near-zero cost, and what that teaches about the gap between theory and working systems.

growing#second-brain#pkm#building-in-public#knowledge-graph#bilingual#ai

What I built

Over a weekend in March 2026, I built a second brain. Not the kind that lives in a note-taking app and collects dust — one with a knowledge graph of 154 nodes and 369 edges, a bilingual Spanish-English pipeline, and endpoints that any AI agent can query.

The result is jonmatum.com: 131 concepts, 20 experiments, 3 notes, ~131,000 words in two languages, an interactive D3 graph, and an agent-friendly surface with /llms.txt, /api/knowledge, /api/graph, and /api/mcp. Monthly cost: $0.

The inspiration came from Nate B Jones' series on AI-powered second brains. His thesis: the second brain is no longer a passive storage system — it's an active system that works while you sleep. I watched the video, read the posts, and instead of taking notes about how to build one, I just built it.

The 8 blocks in practice

Jones defines 8 building blocks for a functional second brain. Here's how they map to what I built:

BlockJonesjonmatum.comStatus
1. DropboxSlack channel, zero frictionMDX files in content/Partial — high friction (create file, frontmatter, commit)
2. SorterAI agent auto-classifiesManual type: field in frontmatterPartial — manual
3. FormStrict schema per typeFrontmatter validated by pnpm validateComplete
4. Filing CabinetNotion databasescontent/{concepts,notes,experiments,essays}/Complete
5. ReceiptInbox Log documents every actionGit history (171 commits)Complete — git is the audit trail
6. BouncerConfidence score 0-1Validation + lint pipelinePartial — build-time only
7. Tap on the ShoulderDaily/weekly digests—Missing
8. Fix ButtonChat commands to reclassifyManual file editsPartial — manual

Blocks 3-5 are solid. Block 7 — proactive surfacing — is completely missing. And it's, according to Jones, the differentiator between a storage system and a real second brain.

What worked

Separation of concerns

Jones insists on separating memory, compute, and interface. Without having read that principle before starting, the monorepo architecture already implements it:

  • Memory: MDX files in content/ — plain text, portable, no vendor lock-in
  • Compute: packages/knowledge/ — parser, graph builder, llms.txt generator, embeddings
  • Interface: Next.js in apps/web/ — the human door

This separation allowed changing the interface without touching content, and vice versa. When semantic search failed in production, I could revert to keyword search without losing a single graph node.

The two-door principle

Jones describes a "shared surface with two doors": the agent enters through one, the human through the other. Both access the same data.

At jonmatum.com:

  • Human door: the website with type-based navigation, interactive graph, search, i18n
  • Agent door: /llms.txt (634 lines), /llms-full.txt (11,568 lines), /api/knowledge, /api/graph, /api/concepts/[id], /api/mcp

What's missing: the agent door is read-only. Jones proposes both doors read and write. Today, an agent can query my entire knowledge graph but can't add a node. That's the biggest gap with Open Brain.

Architecture over tools

One of the principles that emerged from Jones' community: "architecture matters more than tools." One member swapped every recommended tool and got the same results.

jonmatum.com validates this. The stack is deliberately generic: MDX (plain text), JSON (generated data), Next.js (interface). There's no dependency on Notion, Obsidian, or any PKM SaaS. If tomorrow I want to migrate the interface to Astro or the storage to Postgres, the MDX content remains the same.

Design for restart, not perfection

Every concept has a status: seed | growing | evergreen field. A seed is a 150-word stub with valid frontmatter. It doesn't need to be complete to exist in the graph — it just needs to be correct.

This unlocked speed: instead of writing 131 perfect concepts, I wrote 131 seeds and let them grow. The graph benefits from coverage as much as from depth.

What failed

Client-side semantic search

I attempted semantic search with @huggingface/transformers (Xenova/all-MiniLM-L6-v2) running via WebAssembly in the browser. It failed for three reasons:

  1. ~30MB download on first search — unacceptable for a personal site
  2. Tensor API inconsistencies in Transformers.js v3 — tolist(), .data, and output[0] behave differently between Node.js and browser WASM
  3. Vercel deployment issues that didn't reproduce locally

I reverted to instant keyword search. Pre-computed embeddings are still generated at build time for future use. The viable solution is a server-side API route — ~$0.10/month at current scale.

This is a concrete example of the implementation gap Jones describes: technology that should work in theory but breaks in practice.

Capture friction

Jones' block 1 — the Dropbox — requires zero friction. Capturing a thought in jonmatum.com requires: creating an MDX file, writing frontmatter with 11 mandatory fields, writing content in Spanish, creating the .en.mdx file with the translation, and committing. That's not zero friction — it's a 15-30 minute process per concept.

The speed I achieved (154 items in 5 days) was possible because I used AI agents to assist generation. Without that assistance, the pace would be unsustainable. Low-friction capture is the next problem to solve.

The implementation gap

McKinsey reports that 88% of organizations use AI in at least one business function, but only 1% consider their deployments mature. Jones frames this as the implementation gap: the world splits between the "hype machine" and the "feature comparison industrial complex." What's missing is the bridge.

Building jonmatum.com over a weekend taught me where that bridge exists — and where it doesn't:

The bridge exists when architecture is simple and tools are mature. Turborepo, Next.js, MDX, Tailwind — none of this was an obstacle. Infrastructure was built in hours, not days.

The bridge doesn't exist when technology promises more than it delivers in production. Client-side semantic search is the perfect example: works in demos, fails in deployment. The gap isn't knowledge — it's practical integration.

The bridge also doesn't exist for content. 131,000 words in two languages don't write themselves. AI assisted, but editorial judgment — what to include, how to connect it, what level of depth — remains human. The second brain automates the system, not the thinking.

Why bilingual

My native language is Spanish. Writing in Spanish is natural — ideas flow without the friction of mentally translating. But I wanted to share knowledge with my team and community, and English is the universal language of technology. At the same time, I wanted my friends and family who don't speak English to access the content.

The solution: Spanish as the primary language (the default site), English as translation (available at /en/...). Every concept has .mdx (Spanish) and .en.mdx (English). Frontmatter includes title_es/title_en and summary_es/summary_en.

The cost: 73,963 words in Spanish + 57,409 in English = ~78% more content than a monolingual system. But the value is disproportionate:

  • AI agents consume /llms.txt in English — the language LLMs perform best in
  • Spanish-speaking humans read the site in Spanish — no barriers
  • The knowledge graph is the same for both languages — one structure, two interfaces

I found no other public second brains that are bilingual. The PKM ecosystem is overwhelmingly monolingual (English). This is an opportunity, not a problem.

Is it sustainable at scale? With AI-assisted translation + human review, yes. Without AI, no. At 500+ items, manual translation would be a bottleneck. But that's a problem to solve when it arrives — not a reason not to start.

What comes next

The system is at 75%. It has MDX capture, type-based organization, a knowledge graph with 154 nodes and 369 edges, bilingual summaries, keyword search, agent-friendly endpoints (/llms.txt, /api/*), D3 graph visualization, full i18n, and Vercel deployment. What's missing:

Server-side semantic search

Current search is keyword-based — it works, but doesn't understand meaning. The plan: an API route /api/search that loads pre-computed embeddings (already exist in embeddings.json), computes the query embedding with OpenAI text-embedding-3-small, and runs cosine similarity. At 155 nodes I don't need a vector database — everything fits in memory. Estimated cost: ~$0.10/month.

Conversational interface with RAG

Being able to ask the second brain: "what do I know about infrastructure as code?" and get an answer with context from my own concepts. The architecture: a Strands agent that uses embeddings for retrieval-augmented generation and Claude Sonnet via Bedrock for generation. API route /api/chat with streaming. Estimated cost: ~$1-3/month depending on usage.

Backlinks UI

The graph already has the edges — if concept A references concept B, the edge exists. What's missing is inverting it in the UI: showing on each page "these concepts reference you." It's a getBacklinks(slug) helper and a component. Cost: $0.

External capture

Today the only way to capture is creating an MDX file manually. Next step: an API route /api/capture that accepts text + source URL, stores it as MDX in captures/, and optionally generates embeddings. This opens the door to a bookmarklet, browser extension, or Readwise integration.

Proactive surfacing

Jones' block 7 — the one that's completely missing. The data to implement it already exists: created/updated dates on every item, the graph with edges to find related content, and status: seed as a priority signal. The simplest implementation: a homepage widget that uses localStorage to track which concepts you visited and when, and suggests forgotten seeds or concepts you haven't reviewed in weeks.

The MCP server

The most interesting step. jonmatum.com already exposes /api/mcp — the endpoint exists. A full MCP server would expose tools like search_knowledge, get_concept, list_related that any AI client could invoke. The protocol was donated to the Linux Foundation in 2026, has over 1,000 servers in production, and is supported by Claude, ChatGPT, Cursor, and virtually every relevant AI client. This would turn the agent door from read-only to read-write — closing the gap with Open Brain.

Total cost

CapabilityMonthly cost
Semantic search~$0.10
AI chat~$1-3
Backlinks, capture, surfacing, MCP$0
Total~$1-4/month

Everything fits within Vercel's Pro plan limits. The evolution path from static MDX to Postgres + MCP exists when scale demands it. For now, the cost is ~$0 and the architecture is portable. The second brain doesn't need to be perfect — it needs to exist.

References

  • Why 2026 Is the Year to Build a Second Brain — Nate B Jones, 2026. The original video with the 8 building blocks framework.
  • Open Brain: the $0.10/month fix — Nate B Jones, March 2026. Postgres + MCP architecture.
  • Learn in Public — Shawn Wang (swyx), 2018. The foundational essay on learning and building in public.
  • 25 Years of Personal Knowledge Management — Sébastien Dubois, 2022. A 25-year chronicle of PKM evolution.
  • Building a Second Brain — Tiago Forte, 2022. The book that popularized the concept and the PARA method.
  • Zettelkasten Method — Christian Tietze. Reference for Luhmann's interconnected notes method.
  • Model Context Protocol — Specification — Anthropic/Linux Foundation. Protocol specification.
  • How Leaders Can Realize Value from GenAI — Forbes, 2026. Summary of McKinsey State of AI 2025 report (88% adoption, 1% maturity).

Related content

  • Knowledge Graphs

    Data structures representing knowledge as networks of entities and relationships, enabling reasoning, connection discovery, and semantic queries over complex domains.

  • Model Context Protocol (MCP)

    Open protocol created by Anthropic that standardizes how AI applications connect with external tools, data, and services through a universal interface.

  • Artificial Intelligence

    Field of computer science dedicated to creating systems capable of performing tasks that normally require human intelligence, from reasoning and perception to language generation.

  • AI Agents

    Autonomous systems that combine language models with reasoning, memory, and tool use to execute complex multi-step tasks with minimal human intervention.

  • llms.txt

    Proposed standard for publishing a Markdown file at a website's root that enables language models to efficiently understand and use the site's content at inference time.

  • Semantic Search

    Information retrieval technique that uses vector embeddings to find results by meaning, not just exact keyword matching.

  • Embeddings

    Dense vector representations that capture the semantic meaning of text, images, or other data in a numerical space where proximity reflects conceptual similarity.

  • Retrieval-Augmented Generation

    Architectural pattern that combines information retrieval from external sources with LLM text generation, reducing hallucinations and keeping knowledge current without retraining the model.

  • Strands Agents

    Open source SDK from AWS for building AI agents with a model-driven approach. Functional agents in a few lines of code, with multi-model support, custom tools, MCP, multi-agent, and built-in observability.

  • AWS Bedrock

    AWS managed service providing access to foundation models from multiple providers (Anthropic, Meta, Mistral) via API, without managing ML infrastructure.

  • Next.js

    React framework for full-stack web applications with Server Components, file-based routing, SSR/SSG, and built-in performance optimizations.

  • Monorepos

    Code organization strategy where multiple projects coexist in a single repository, sharing dependencies, configuration, and build tooling.

  • Tailwind CSS

    Utility-first CSS framework enabling design building directly in markup using atomic classes, eliminating the need to write custom CSS.

  • Vector Databases

    Storage systems specialized in indexing and searching high-dimensional vectors efficiently, enabling semantic search and RAG applications at scale.

  • Infrastructure as Code

    Practice of defining and managing infrastructure through versioned configuration files instead of manual processes. Foundation of modern operations automation.

Essays