Concepts

Semantic Search

Information retrieval technique that uses vector embeddings to find results by meaning, not just exact keyword matching.

growing#search#embeddings#vector-search#nlp#transformers#information-retrieval

What it is

Semantic search is an information retrieval technique that goes beyond literal word matching. Instead of checking whether a document contains the exact query terms, it converts both the query and documents into numerical vectors — called embeddings — and measures similarity between them in a high-dimensional vector space.

If a user searches for "how to deploy to the cloud," keyword search would only find documents containing those exact words. Semantic search would also surface documents about "AWS deployment," "serverless infrastructure," or "CI/CD with containers" — because it understands the meaning is similar.

How it works

The process has three phases:

1. Embedding generation

A language model transforms text into fixed-dimension vectors. Each vector captures the semantic meaning of the text in a space where similar texts end up close together.

Common models for this include:

  • all-MiniLM-L6-v2 — 384 dimensions, lightweight, good general quality
  • text-embedding-3-small (OpenAI) — 1536 dimensions, commercial API
  • Cohere Embed v3 — multilingual, optimized for retrieval

2. Indexing

Document embeddings are stored in a structure that enables efficient similarity search. Options range from a simple in-memory array to specialized vector databases.

3. Querying

The user's query is converted into an embedding using the same model, and cosine similarity is calculated against all indexed documents. Results are ranked by similarity.

query → embedding → cosine similarity → ranked results

Implementation on this site

The first attempt (client-side)

The first semantic search implementation on this site used Transformers.js with the Xenova/all-MiniLM-L6-v2 model running directly in the browser via WebAssembly. Embeddings were pre-computed at build time and passed as props to the search page.

Why it was removed. Four issues made it unviable in production:

  1. ~30 MB download on first search — the ONNX model downloaded silently while the user saw "Searching..." with no results. For a personal site, this is unacceptable.
  2. Tensor API inconsistencies — Transformers.js v3 returns tensors with different formats between Node.js and the browser WASM runtime. tolist(), .data, and output[0] all behaved differently depending on context, causing silent failures.
  3. Vercel deployment issues — the WASM runtime had compatibility problems in production that didn't reproduce locally.
  4. Complexity vs. value — with few articles, semantic search offers no significant advantage over keyword matching.

The decision and technical details are documented in issue #9 of the repository.

Current state

The current search is keyword-based: instant matching against titles, summaries, and tags. Zero dependencies, works synchronously, no loading state needed.

Pre-computed embeddings are still generated at build time (scripts/generate-embeddings.ts) and stored in public/embeddings.json — ready for future use.

Production approaches

To implement semantic search robustly, there are several options depending on scale:

Server-side with local model

Run the embedding model in a Node.js API route (not in the browser). The model loads once at server startup and processes queries in milliseconds. Viable for sites with moderate traffic.

Vector database

For large collections, a specialized vector database handles efficient indexing and search:

  • Pinecone — managed service, auto-scaling
  • Weaviate — open source, supports hybrid search
  • pgvector — PostgreSQL extension, ideal if already using Postgres
  • Qdrant — open source, high performance

Embeddings API + in-memory search

For static sites with a few dozen documents, pre-computing embeddings at build time and doing cosine search on the client is viable — as long as the model isn't downloaded in the browser. Pre-computed embeddings weigh kilobytes, not megabytes.

Hybrid search

Combining keyword search and semantic search typically yields better results than either alone. The typical approach:

  1. Run both searches in parallel
  2. Normalize scores from each
  3. Combine with a configurable weight (e.g., 0.7 semantic + 0.3 keyword)

Benefits

  • Synonym and paraphrase understanding — finds relevant results even when they don't share exact words with the query
  • Multilingual search — with multilingual models, a Spanish query can find English documents and vice versa
  • Typo tolerance — embeddings capture meaning, not exact spelling
  • Related content discovery — vector similarity enables suggesting related articles without manual configuration
  • Scalability — works equally well with 10 documents or 10 million (with appropriate infrastructure)

When it's worth it

Semantic search adds real value when:

  • The corpus has more than ~20 documents
  • Users search with natural language, not exact terms
  • Content is multilingual
  • Related content discovery is needed

For small collections with predictable vocabulary, keyword search is simpler, faster, and effective enough.

References

Concepts