Concepts

Large Language Models

Massive neural networks based on the Transformer architecture, trained on enormous text corpora to understand and generate natural language with emergent capabilities like reasoning, translation, and code generation.

seed#llm#transformer#gpt#claude#foundation-models#deep-learning#nlp

What it is

A large language model (LLM) is a neural network with billions of parameters, trained on massive amounts of text to predict the next word in a sequence. This seemingly simple task — predicting what word comes next — produces surprising emergent capabilities when scaled sufficiently.

Modern LLMs don't just complete text: they follow complex instructions, reason step by step, write code, translate between languages, and maintain coherent long-context conversations.

How they work

The Transformer architecture

Introduced in the paper "Attention Is All You Need" (2017), the Transformer architecture replaced recurrent networks with an attention mechanism that allows the model to consider all words in a sequence simultaneously, capturing long-range relationships.

Key components:

  • Tokenization: text is split into tokens (subwords) that the model processes numerically
  • Embeddings: each token is converted into a dense vector capturing its semantic meaning
  • Attention layers: multiple layers that learn which parts of context are relevant for each prediction
  • Context window: the maximum number of tokens the model can process in a single inference

Two-phase training

  1. Pre-training: the model learns general language patterns by processing trillions of text tokens. This phase is extremely compute-intensive
  2. Fine-tuning: the model is specialized to follow instructions, align with human preferences (RLHF), or adapt to specific domains

Emergent capabilities

As models scale, capabilities emerge that weren't explicitly programmed:

  • Chain-of-Thought reasoning: ability to decompose complex problems into intermediate steps
  • In-Context Learning: learning from examples provided in the prompt without updating weights
  • Tool use: invoking APIs, executing code, or querying databases when configured to do so
  • Instruction following: interpreting and executing complex natural language instructions

Relevant models

ModelOrganizationCharacteristics
GPT-4oOpenAIMultimodal, advanced reasoning
ClaudeAnthropicLong context (200K tokens), safety
GeminiGoogleNative multimodal, search integration
LlamaMetaOpen-source, active community
MistralMistral AIEfficient, competitive open models
Command RCohereOptimized for RAG and enterprise

Limitations

  • Hallucinations: generate plausible but incorrect information with high confidence
  • Static knowledge: their knowledge has a training cutoff date
  • Inference cost: larger models require specialized hardware
  • Finite context window: although growing, still a limitation for very long documents
  • Bias: reflect biases present in training data

Why it matters

LLMs are the foundational technology behind the current artificial intelligence revolution. They're the engine powering AI agents, prompt engineering techniques, and semantic search systems. Understanding how they work — and their limitations — is essential for building effective AI applications.

References

Concepts