Semantic Search
Information retrieval technique that uses vector embeddings to find results by meaning, not just exact keyword matching.
What it is
Semantic search is an information retrieval technique that goes beyond literal word matching. Instead of checking whether a document contains the exact query terms, it converts both the query and documents into numerical vectors — called embeddings — and measures similarity between them in a high-dimensional vector space.
If a user searches for "how to deploy to the cloud," keyword search would only find documents containing those exact words. Semantic search would also surface documents about "AWS deployment," "serverless infrastructure," or "CI/CD with containers" — because it understands the meaning is similar.
How it works
The process has three phases:
1. Embedding generation
A language model transforms text into fixed-dimension vectors. Each vector captures the semantic meaning of the text in a space where similar texts end up close together.
Common models for this include:
- all-MiniLM-L6-v2 — 384 dimensions, lightweight, good general quality
- text-embedding-3-small (OpenAI) — 1536 dimensions, commercial API
- Cohere Embed v3 — multilingual, optimized for retrieval
2. Indexing
Document embeddings are stored in a structure that enables efficient similarity search. Options range from a simple in-memory array to specialized vector databases.
3. Querying
The user's query is converted into an embedding using the same model, and cosine similarity is calculated against all indexed documents. Results are ranked by similarity.
query → embedding → cosine similarity → ranked results
Implementation on this site
The first attempt (client-side)
The first semantic search implementation on this site used Transformers.js with the Xenova/all-MiniLM-L6-v2 model running directly in the browser via WebAssembly. Embeddings were pre-computed at build time and passed as props to the search page.
Why it was removed. Four issues made it unviable in production:
- ~30 MB download on first search — the ONNX model downloaded silently while the user saw "Searching..." with no results. For a personal site, this is unacceptable.
- Tensor API inconsistencies — Transformers.js v3 returns tensors with different formats between Node.js and the browser WASM runtime.
tolist(),.data, andoutput[0]all behaved differently depending on context, causing silent failures. - Vercel deployment issues — the WASM runtime had compatibility problems in production that didn't reproduce locally.
- Complexity vs. value — with few articles, semantic search offers no significant advantage over keyword matching.
The decision and technical details are documented in issue #9 of the repository.
Current state
The current search is keyword-based: instant matching against titles, summaries, and tags. Zero dependencies, works synchronously, no loading state needed.
Pre-computed embeddings are still generated at build time (scripts/generate-embeddings.ts) and stored in public/embeddings.json — ready for future use.
Production approaches
To implement semantic search robustly, there are several options depending on scale:
Server-side with local model
Run the embedding model in a Node.js API route (not in the browser). The model loads once at server startup and processes queries in milliseconds. Viable for sites with moderate traffic.
Vector database
For large collections, a specialized vector database handles efficient indexing and search:
- Pinecone — managed service, auto-scaling
- Weaviate — open source, supports hybrid search
- pgvector — PostgreSQL extension, ideal if already using Postgres
- Qdrant — open source, high performance
Embeddings API + in-memory search
For static sites with a few dozen documents, pre-computing embeddings at build time and doing cosine search on the client is viable — as long as the model isn't downloaded in the browser. Pre-computed embeddings weigh kilobytes, not megabytes.
Hybrid search
Combining keyword search and semantic search typically yields better results than either alone. The typical approach:
- Run both searches in parallel
- Normalize scores from each
- Combine with a configurable weight (e.g., 0.7 semantic + 0.3 keyword)
Benefits
- Synonym and paraphrase understanding — finds relevant results even when they don't share exact words with the query
- Multilingual search — with multilingual models, a Spanish query can find English documents and vice versa
- Typo tolerance — embeddings capture meaning, not exact spelling
- Related content discovery — vector similarity enables suggesting related articles without manual configuration
- Scalability — works equally well with 10 documents or 10 million (with appropriate infrastructure)
When it's worth it
Semantic search adds real value when:
- The corpus has more than ~20 documents
- Users search with natural language, not exact terms
- Content is multilingual
- Related content discovery is needed
For small collections with predictable vocabulary, keyword search is simpler, faster, and effective enough.
References
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks — Reimers and Gurevych, 2019. The paper that popularized sentence embeddings for semantic search.
- Transformers.js — Hugging Face. Library for running Transformer models in JavaScript/WASM.
- all-MiniLM-L6-v2 — Sentence Transformers. Lightweight 384-dimension model for sentence embeddings.
- pgvector — Andrew Kane. PostgreSQL extension for vector search.
- Issue #9: Re-evaluate semantic search implementation — jonmatum.com. Documentation of the decision to remove client-side semantic search.