Embeddings

What it is

An embedding is a numerical representation of data — text, image, audio — as a fixed-dimension dense vector. The fundamental property is that semantically similar data produces nearby vectors in the space, while different data ends up far apart.

For example, the embeddings for "dog" and "puppy" will be close together, while "dog" and "economics" will be far apart. This allows machines to operate on "meaning" mathematically.

How they work

Generation

An embedding model (like text-embedding-3-small from OpenAI or all-MiniLM-L6-v2 from Sentence Transformers) takes input text and produces a fixed-dimension vector — typically between 384 and 3,072 dimensions.

The model learns these representations during training, optimizing so that texts with similar meaning produce nearby vectors.

Similarity metrics

To compare embeddings, distance metrics are used:

Cosine similarity: measures the angle between vectors (most common)
Dot product: similar to cosine but sensitive to magnitude
Euclidean distance: direct geometric distance between points

Types of embeddings

Word embeddings: one vector per word (Word2Vec, GloVe) — historical but limited
Sentence embeddings: one vector per sentence or paragraph — the current standard
Multimodal: vectors representing text and images in the same space (CLIP)

Example with Sentence Transformers

from sentence_transformers import SentenceTransformer
import numpy as np
 
model = SentenceTransformer("all-MiniLM-L6-v2")
 
texts = [
    "The dog runs through the park",
    "A puppy plays in the garden",
    "Inflation affects the global economy"
]
 
embeddings = model.encode(texts)
 
# Cosine similarity between first two (semantically close)
sim_01 = np.dot(embeddings[0], embeddings[1]) / (
    np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[1])
)
# sim_01 ≈ 0.68 (high similarity)
 
# Similarity between first and third (semantically distant)
sim_02 = np.dot(embeddings[0], embeddings[2]) / (
    np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[2])
)
# sim_02 ≈ 0.05 (low similarity)

Popular models

Model	Dimensions	Max context	Typical use
`all-MiniLM-L6-v2`	384	256 tokens	Fast prototyping, low cost
`text-embedding-3-small` (OpenAI)	1,536	8,191 tokens	Production with API
`text-embedding-3-large` (OpenAI)	3,072	8,191 tokens	Maximum quality
`amazon.titan-embed-text-v2`	1,024	8,192 tokens	AWS Bedrock
`voyage-3` (Voyage AI)	1,024	32,000 tokens	Long context, code

The choice depends on the balance between quality, cost, and latency. For most RAG applications, a 1,024-dimension model offers a good balance.

Applications

Application	How it uses embeddings	Similarity metric
Semantic search	Compares query embedding with document embeddings	Cosine similarity
RAG	Retrieves relevant chunks to give context to the LLM	Cosine similarity + reranking
Classification	Groups documents by proximity in vector space	Euclidean distance or cosine
Duplicate detection	Identifies content with high similarity	Similarity threshold (> 0.9)
Recommendations	Suggests content close to user profile	k-nearest neighbors

Practical considerations

Dimensionality vs. performance: more dimensions capture more nuance but require more storage and compute
Model matters: the same text produces different embeddings with different models — they're not interchangeable
Chunking: for long documents, it's better to generate embeddings per chunk than per complete document
Normalization: some models require normalizing vectors before comparison

Why it matters

Embeddings are the foundation of semantic search, RAG systems, and content classification. Without them, AI applications are limited to exact text matching. Understanding their properties — dimensionality, cosine distance, language limitations — is essential for building effective information retrieval systems.

References

Efficient Estimation of Word Representations in Vector Space — Mikolov et al., 2013. The original Word2Vec paper.
Sentence-BERT — Reimers & Gurevych, 2019. Efficient sentence embeddings based on BERT.
Text Embeddings by Weakly-Supervised Contrastive Pre-training — Wang et al., 2022. E5, general-purpose text embeddings.
MTEB: Massive Text Embedding Benchmark — Hugging Face, 2022. Benchmark for comparing embedding models.
Pretrained Models — Sentence Transformers — SBERT, 2024. Catalog of pretrained models with metrics.

What it is

For example, the embeddings for "dog" and "puppy" will be close together, while "dog" and "economics" will be far apart. This allows machines to operate on "meaning" mathematically.

How they work

Generation

The model learns these representations during training, optimizing so that texts with similar meaning produce nearby vectors.

Similarity metrics

To compare embeddings, distance metrics are used:

Cosine similarity: measures the angle between vectors (most common)
Dot product: similar to cosine but sensitive to magnitude
Euclidean distance: direct geometric distance between points

Types of embeddings

Word embeddings: one vector per word (Word2Vec, GloVe) — historical but limited
Sentence embeddings: one vector per sentence or paragraph — the current standard
Multimodal: vectors representing text and images in the same space (CLIP)

Example with Sentence Transformers

from sentence_transformers import SentenceTransformer
import numpy as np
 
model = SentenceTransformer("all-MiniLM-L6-v2")
 
texts = [
    "The dog runs through the park",
    "A puppy plays in the garden",
    "Inflation affects the global economy"
]
 
embeddings = model.encode(texts)
 
# Cosine similarity between first two (semantically close)
sim_01 = np.dot(embeddings[0], embeddings[1]) / (
    np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[1])
)
# sim_01 ≈ 0.68 (high similarity)
 
# Similarity between first and third (semantically distant)
sim_02 = np.dot(embeddings[0], embeddings[2]) / (
    np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[2])
)
# sim_02 ≈ 0.05 (low similarity)

Popular models

Model	Dimensions	Max context	Typical use
`all-MiniLM-L6-v2`	384	256 tokens	Fast prototyping, low cost
`text-embedding-3-small` (OpenAI)	1,536	8,191 tokens	Production with API
`text-embedding-3-large` (OpenAI)	3,072	8,191 tokens	Maximum quality
`amazon.titan-embed-text-v2`	1,024	8,192 tokens	AWS Bedrock
`voyage-3` (Voyage AI)	1,024	32,000 tokens	Long context, code

The choice depends on the balance between quality, cost, and latency. For most RAG applications, a 1,024-dimension model offers a good balance.

Applications

Application	How it uses embeddings	Similarity metric
Semantic search	Compares query embedding with document embeddings	Cosine similarity
RAG	Retrieves relevant chunks to give context to the LLM	Cosine similarity + reranking
Classification	Groups documents by proximity in vector space	Euclidean distance or cosine
Duplicate detection	Identifies content with high similarity	Similarity threshold (> 0.9)
Recommendations	Suggests content close to user profile	k-nearest neighbors

Practical considerations

Dimensionality vs. performance: more dimensions capture more nuance but require more storage and compute
Model matters: the same text produces different embeddings with different models — they're not interchangeable
Chunking: for long documents, it's better to generate embeddings per chunk than per complete document
Normalization: some models require normalizing vectors before comparison

Why it matters

References

Efficient Estimation of Word Representations in Vector Space — Mikolov et al., 2013. The original Word2Vec paper.
Sentence-BERT — Reimers & Gurevych, 2019. Efficient sentence embeddings based on BERT.
Text Embeddings by Weakly-Supervised Contrastive Pre-training — Wang et al., 2022. E5, general-purpose text embeddings.
MTEB: Massive Text Embedding Benchmark — Hugging Face, 2022. Benchmark for comparing embedding models.
Pretrained Models — Sentence Transformers — SBERT, 2024. Catalog of pretrained models with metrics.

Embeddings

What it is

How they work

Generation

Similarity metrics

Types of embeddings

Example with Sentence Transformers

Popular models

Applications

Practical considerations

Why it matters

References

Related content

Embeddings

What it is

How they work

Generation

Similarity metrics

Types of embeddings

Example with Sentence Transformers

Popular models

Applications

Practical considerations

Why it matters

References

Related content