Concepts

Context Windows

The maximum number of tokens an LLM can process in a single interaction, determining how much information it can consider simultaneously to generate responses.

seed#context-window#tokens#llm#memory#attention#scaling

What it is

The context window is the maximum limit of tokens (words and subwords) that an LLM can process in a single interaction. It includes both input (prompt, context, history) and generated output. It's the model's "working memory."

Size evolution

YearModelWindow
2022GPT-3.54K tokens
2023GPT-48K–32K tokens
2023Claude 2100K tokens
2024Claude 3200K tokens
2024Gemini 1.51M–2M tokens
2025GPT-4.11M tokens

For reference: 1K tokens ≈ 750 English words, ≈ 600 Spanish words.

Why it matters

The context window is the fundamental constraint that defines what an LLM can and cannot do in a single interaction. In RAG, it determines how many retrieved documents fit in the prompt. In conversations, it limits the history the model remembers. In agents, it affects how many reasoning iterations fit in a session. Designing systems that work within these limits — with chunking, summarization, and context management — is an essential architectural skill for any LLM-based application.

The "Lost in the Middle" problem

Models don't pay uniform attention to all context. Research shows that information at the beginning and end of context is processed better than the middle. This has practical implications for how to order information in long prompts.

Strategies for long contexts

  • Smart chunking: split documents and process in parts
  • Progressive summarization: summarize earlier sections to free space
  • Prioritization: place critical information at the beginning or end of context
  • Selective RAG: retrieve only the most relevant fragments instead of complete documents

Cost and performance

More context = more compute. Attention in Transformers scales quadratically with sequence length (O(n²)), though techniques like sparse attention, sliding window, and ring attention reduce this in practice.

References

Concepts