Jonatan Matajonmatum.com
conceptsnotesexperimentsessays
© 2026 Jonatan Mata. All rights reserved.v2.1.1
Concepts

llms.txt

Proposed standard for publishing a Markdown file at a website's root that enables language models to efficiently understand and use the site's content at inference time.

growing#llms-txt#ai#web-standards#seo#agents#markdown#inference

What it is

llms.txt is a standard proposed by Jeremy Howard (fast.ai) in September 2024 for placing a Markdown file at a website's /llms.txt path. Its purpose is to offer language models a concise, structured, and readable version of the site's most important content — without the noise of HTML, navigation, ads, or JavaScript.

It's analogous to robots.txt and sitemap.xml, but with a different goal:

  • robots.txt tells crawlers what access is acceptable
  • sitemap.xml lists all indexable pages for search engines
  • llms.txt provides a curated summary and links to detailed content for language models

Why it matters

Language models face a fundamental limitation when interacting with websites: context windows are too small to process an entire site, and converting complex HTML to plain text is imprecise and noisy.

llms.txt solves this by providing:

  1. Immediate context — a site summary that fits in a context window
  2. Structured navigation — links to detailed Markdown files organized by section
  3. Curated information — only relevant content, no duplication or noise
  4. Human and machine readable format — Markdown is the most widely understood format by current LLMs

How it's used

At inference time

The primary use case is during inference — when a user asks a language model for information. For example:

  • A developer includes a library's documentation in their AI-assisted IDE
  • A chatbot with search capability queries a site to answer questions
  • An AI agent needs to understand a service's structure to interact with it

File format

The file follows a specific Markdown structure:

# Project name
 
> Brief description with key information
 
Additional details about the project.
 
## Section
 
- [Link title](https://url): Optional notes about the file
 
## Optional
 
- [Link title](https://url): Secondary content that can be skipped

The "Optional" section has special meaning: links there can be skipped if a shorter context is needed.

Common variants

Many sites publish expanded variants:

  • /llms.txt — the base file with summary and links
  • /llms-full.txt — expanded version with the full content of each link embedded

Implementation on this site

This site publishes two files auto-generated by the knowledge pipeline:

  • /llms.txt — index with title, type, and English summary for each knowledge node
  • /llms-full.txt — full content of each article in plain format

Both are regenerated with every pnpm generate run and served as static files from public/.

Relationship with other standards

StandardAudiencePurpose
robots.txtCrawlersAccess control
sitemap.xmlSearch enginesPage index
llms.txtLanguage modelsCurated site summary
MCPAI agentsTools and context protocol

llms.txt and MCP are complementary: llms.txt provides static readable content, while MCP enables dynamic interactions with tools and services.

Adoption

Since its proposal in 2024, llms.txt has been adopted by technical documentation projects, e-commerce sites, educational institutions, and personal websites. The specification is deliberately simple — a Markdown file with minimal conventions — making it easy to adopt without specialized tooling.

References

  • The /llms.txt file — Jeremy Howard, 2024. Original standard specification.
  • llms.txt in Different Domains — llmstxt.org. Official llms.txt standard site.
  • FastHTML llms.txt — llmstxt.org. Official site with specification and implementation examples.
  • What is llms.txt? A practical guide — Hall, 2025. Practical implementation guide.

Related content

  • Artificial Intelligence

    Field of computer science dedicated to creating systems capable of performing tasks that normally require human intelligence, from reasoning and perception to language generation.

  • Model Context Protocol (MCP)

    Open protocol created by Anthropic that standardizes how AI applications connect with external tools, data, and services through a universal interface.

  • Building a Second Brain in Public

    Chronicle of building a second brain with a knowledge graph, bilingual pipeline, and agent endpoints — in days, not weeks, and what that teaches about the gap between theory and working systems.

  • Retrieval-Augmented Generation

    Architectural pattern that combines information retrieval from external sources with LLM text generation, reducing hallucinations and keeping knowledge current without retraining the model.

  • Documentation as Code

    Practice of treating documentation with the same tools and processes as code: versioned in Git, reviewed in PRs, and automatically generated when possible.

Concepts