Dense vector representations that capture the semantic meaning of text, images, or other data in a numerical space where proximity reflects conceptual similarity.
An embedding is a numerical representation of data — text, image, audio — as a fixed-dimension dense vector. The fundamental property is that semantically similar data produces nearby vectors in the space, while different data ends up far apart.
For example, the embeddings for "dog" and "puppy" will be close together, while "dog" and "economics" will be far apart. This allows machines to operate on "meaning" mathematically.
An embedding model (like text-embedding-3-small from OpenAI or all-MiniLM-L6-v2 from Sentence Transformers) takes input text and produces a fixed-dimension vector — typically between 384 and 3,072 dimensions.
The model learns these representations during training, optimizing so that texts with similar meaning produce nearby vectors.
To compare embeddings, distance metrics are used:
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer("all-MiniLM-L6-v2")
texts = [
"The dog runs through the park",
"A puppy plays in the garden",
"Inflation affects the global economy"
]
embeddings = model.encode(texts)
# Cosine similarity between first two (semantically close)
sim_01 = np.dot(embeddings[0], embeddings[1]) / (
np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[1])
)
# sim_01 ≈ 0.68 (high similarity)
# Similarity between first and third (semantically distant)
sim_02 = np.dot(embeddings[0], embeddings[2]) / (
np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[2])
)
# sim_02 ≈ 0.05 (low similarity)| Model | Dimensions | Max context | Typical use |
|---|---|---|---|
all-MiniLM-L6-v2 | 384 | 256 tokens | Fast prototyping, low cost |
text-embedding-3-small (OpenAI) | 1,536 | 8,191 tokens | Production with API |
text-embedding-3-large (OpenAI) | 3,072 | 8,191 tokens | Maximum quality |
amazon.titan-embed-text-v2 | 1,024 | 8,192 tokens | AWS Bedrock |
voyage-3 (Voyage AI) | 1,024 | 32,000 tokens | Long context, code |
The choice depends on the balance between quality, cost, and latency. For most RAG applications, a 1,024-dimension model offers a good balance.
| Application | How it uses embeddings | Similarity metric |
|---|---|---|
| Semantic search | Compares query embedding with document embeddings | Cosine similarity |
| RAG | Retrieves relevant chunks to give context to the LLM | Cosine similarity + reranking |
| Classification | Groups documents by proximity in vector space | Euclidean distance or cosine |
| Duplicate detection | Identifies content with high similarity | Similarity threshold (> 0.9) |
| Recommendations | Suggests content close to user profile | k-nearest neighbors |
Embeddings are the foundation of semantic search, RAG systems, and content classification. Without them, AI applications are limited to exact text matching. Understanding their properties — dimensionality, cosine distance, language limitations — is essential for building effective information retrieval systems.
Computational models inspired by brain structure that learn patterns from data, forming the foundation of modern artificial intelligence systems.
Information retrieval technique that uses vector embeddings to find results by meaning, not just exact keyword matching.
Massive neural networks based on the Transformer architecture, trained on enormous text corpora to understand and generate natural language with emergent capabilities like reasoning, translation, and code generation.
Architectural pattern that combines information retrieval from external sources with LLM text generation, reducing hallucinations and keeping knowledge current without retraining the model.
Chronicle of building a second brain with a knowledge graph, bilingual pipeline, and agent endpoints — in days, not weeks, and what that teaches about the gap between theory and working systems.
Storage systems specialized in indexing and searching high-dimensional vectors efficiently, enabling semantic search and RAG applications at scale.
Process of splitting text into discrete units (tokens) that language models can process numerically, fundamental to how LLMs understand and generate text.