Storage systems specialized in indexing and searching high-dimensional vectors efficiently, enabling semantic search and RAG applications at scale.
A vector database is a storage system optimized for storing, indexing, and searching high-dimensional vectors — typically embeddings generated by AI models. Unlike traditional databases that search by exact match, vector databases find the vectors most similar to a query.
They're the fundamental infrastructure for semantic search and RAG at scale.
Finding the nearest vector among millions requires comparing distances with each one — O(n) per query. Vector databases use approximate indexes (ANN — Approximate Nearest Neighbors) that reduce this to O(log n) or better, sacrificing perfect precision for practical speed.
HNSW has two key parameters that control the tradeoff between speed and recall:
| Parameter | Effect when increased | Typical value |
|---|---|---|
m (connections per node) | Better recall, more memory | 16–64 |
ef_construction (candidates during build) | Better index quality, slower construction | 100–200 |
ef_search (candidates during search) | Better recall, slower search | 50–200 |
The practical rule: start with m=16, ef_construction=128, ef_search=64 and adjust by measuring recall@10 against exact search.
| Database | Characteristics |
|---|---|
| Pinecone | Serverless, managed, easy to start |
| Weaviate | Open-source, GraphQL, integrated ML modules |
| Qdrant | Open-source, Rust, advanced filters |
| Milvus | Open-source, scalable, CNCF project |
| Chroma | Open-source, embeddable, ideal for prototypes |
| Database | Extension |
|---|---|
| PostgreSQL | pgvector |
| Elasticsearch | Dense vector field |
| Redis | Redis Stack (RediSearch) |
| MongoDB | Atlas Vector Search |
pgvector is the most pragmatic option when you already have PostgreSQL — it avoids adding another service to your infrastructure.
-- Enable extension and create table
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(1536)
);
-- Create HNSW index for fast search
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);
-- Semantic search: top 5 most similar documents
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;Purely vector search fails when the user searches for an exact term (a product name, an error code). Hybrid search combines semantic similarity with full-text search:
-- Hybrid search: combine vector + full-text with RRF
WITH semantic AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS rank_s
FROM documents
LIMIT 20
),
keyword AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(to_tsvector(content), plainto_tsquery($2)) DESC) AS rank_k
FROM documents
WHERE to_tsvector(content) @@ plainto_tsquery($2)
LIMIT 20
)
SELECT COALESCE(s.id, k.id) AS id,
COALESCE(1.0 / (60 + s.rank_s), 0) + COALESCE(1.0 / (60 + k.rank_k), 0) AS rrf_score
FROM semantic s FULL OUTER JOIN keyword k ON s.id = k.id
ORDER BY rrf_score DESC
LIMIT 5;This pattern uses Reciprocal Rank Fusion (RRF) to combine rankings from both searches. The constant 60 is the standard value that smooths the fusion.
Vector databases are the infrastructure that makes semantic search and RAG possible at scale. For teams already using PostgreSQL, pgvector eliminates the need for an additional service and covers most cases up to millions of vectors. When volume exceeds that or complex filters with low latency are needed, dedicated databases like Qdrant or Pinecone justify the additional operational complexity.
Dense vector representations that capture the semantic meaning of text, images, or other data in a numerical space where proximity reflects conceptual similarity.
Information retrieval technique that uses vector embeddings to find results by meaning, not just exact keyword matching.
Architectural pattern that combines information retrieval from external sources with LLM text generation, reducing hallucinations and keeping knowledge current without retraining the model.
Chronicle of building a second brain with a knowledge graph, bilingual pipeline, and agent endpoints — in days, not weeks, and what that teaches about the gap between theory and working systems.