Jonatan Matajonmatum.com
conceptsnotesexperimentsessays
© 2026 Jonatan Mata. All rights reserved.v2.1.1
Concepts

Vector Databases

Storage systems specialized in indexing and searching high-dimensional vectors efficiently, enabling semantic search and RAG applications at scale.

evergreen#vector-database#embeddings#similarity-search#rag#pinecone#pgvector

What it is

A vector database is a storage system optimized for storing, indexing, and searching high-dimensional vectors — typically embeddings generated by AI models. Unlike traditional databases that search by exact match, vector databases find the vectors most similar to a query.

They're the fundamental infrastructure for semantic search and RAG at scale.

Why not use a traditional database?

Finding the nearest vector among millions requires comparing distances with each one — O(n) per query. Vector databases use approximate indexes (ANN — Approximate Nearest Neighbors) that reduce this to O(log n) or better, sacrificing perfect precision for practical speed.

Indexing algorithms

  • HNSW (Hierarchical Navigable Small World): navigable graphs enabling efficient search. Most popular for its speed/precision balance
  • IVF (Inverted File Index): partitions space into clusters and searches only the nearest ones
  • PQ (Product Quantization): compresses vectors to reduce memory, useful for massive datasets
  • Flat: exact search without index — precise but slow, only for small datasets

HNSW tuning

HNSW has two key parameters that control the tradeoff between speed and recall:

ParameterEffect when increasedTypical value
m (connections per node)Better recall, more memory16–64
ef_construction (candidates during build)Better index quality, slower construction100–200
ef_search (candidates during search)Better recall, slower search50–200

The practical rule: start with m=16, ef_construction=128, ef_search=64 and adjust by measuring recall@10 against exact search.

Vector database options

Dedicated databases

DatabaseCharacteristics
PineconeServerless, managed, easy to start
WeaviateOpen-source, GraphQL, integrated ML modules
QdrantOpen-source, Rust, advanced filters
MilvusOpen-source, scalable, CNCF project
ChromaOpen-source, embeddable, ideal for prototypes

Extensions for existing databases

DatabaseExtension
PostgreSQLpgvector
ElasticsearchDense vector field
RedisRedis Stack (RediSearch)
MongoDBAtlas Vector Search

Example: semantic search with pgvector

pgvector is the most pragmatic option when you already have PostgreSQL — it avoids adding another service to your infrastructure.

-- Enable extension and create table
CREATE EXTENSION IF NOT EXISTS vector;
 
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  metadata JSONB DEFAULT '{}',
  embedding vector(1536)
);
 
-- Create HNSW index for fast search
CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 128);
 
-- Semantic search: top 5 most similar documents
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;

Hybrid search

Purely vector search fails when the user searches for an exact term (a product name, an error code). Hybrid search combines semantic similarity with full-text search:

-- Hybrid search: combine vector + full-text with RRF
WITH semantic AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS rank_s
  FROM documents
  LIMIT 20
),
keyword AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(to_tsvector(content), plainto_tsquery($2)) DESC) AS rank_k
  FROM documents
  WHERE to_tsvector(content) @@ plainto_tsquery($2)
  LIMIT 20
)
SELECT COALESCE(s.id, k.id) AS id,
       COALESCE(1.0 / (60 + s.rank_s), 0) + COALESCE(1.0 / (60 + k.rank_k), 0) AS rrf_score
FROM semantic s FULL OUTER JOIN keyword k ON s.id = k.id
ORDER BY rrf_score DESC
LIMIT 5;

This pattern uses Reciprocal Rank Fusion (RRF) to combine rankings from both searches. The constant 60 is the standard value that smooths the fusion.

Design considerations

  • Dimensionality: more dimensions = more storage and compute. 1536 (OpenAI) vs 384 (MiniLM) makes a difference at scale
  • Metadata filtering: combining vector search with filters (date, category, user) — pre-filtering reduces the search space but may miss relevant results; post-filtering is more precise but slower
  • Updates: some indexes are expensive to update — consider batch strategies
  • Consistency: many vector databases prioritize availability over strict consistency

Why it matters

Vector databases are the infrastructure that makes semantic search and RAG possible at scale. For teams already using PostgreSQL, pgvector eliminates the need for an additional service and covers most cases up to millions of vectors. When volume exceeds that or complex filters with low latency are needed, dedicated databases like Qdrant or Pinecone justify the additional operational complexity.

References

  • Efficient and robust approximate nearest neighbor search using HNSW — Malkov & Yashunin, 2016. The original HNSW paper.
  • pgvector: Open-source vector similarity search for Postgres — PostgreSQL extension for vector search.
  • Qdrant Documentation — Qdrant, 2024. High-performance vector database.
  • What is a Vector Database? — Pinecone, 2024. Introductory guide with indexing algorithm comparisons.
  • ANN Benchmarks — Comparative benchmarks for nearest neighbor search algorithms.

Related content

  • Embeddings

    Dense vector representations that capture the semantic meaning of text, images, or other data in a numerical space where proximity reflects conceptual similarity.

  • Semantic Search

    Information retrieval technique that uses vector embeddings to find results by meaning, not just exact keyword matching.

  • Retrieval-Augmented Generation

    Architectural pattern that combines information retrieval from external sources with LLM text generation, reducing hallucinations and keeping knowledge current without retraining the model.

  • Building a Second Brain in Public

    Chronicle of building a second brain with a knowledge graph, bilingual pipeline, and agent endpoints — in days, not weeks, and what that teaches about the gap between theory and working systems.

Concepts