Semantic Search

What it is

Semantic search is an information retrieval technique that goes beyond literal word matching. Instead of checking whether a document contains the exact query terms, it converts both the query and documents into numerical vectors — called embeddings — and measures similarity between them in a high-dimensional vector space.

If a user searches for "how to deploy to the cloud," keyword search would only find documents containing those exact words. Semantic search would also surface documents about "AWS deployment," "serverless infrastructure," or "CI/CD with containers" — because it understands the meaning is similar.

How it works

The process has three phases:

1. Embedding generation

A language model transforms text into fixed-dimension vectors. Each vector captures the semantic meaning of the text in a space where similar texts end up close together.

Common models for this include:

all-MiniLM-L6-v2 — 384 dimensions, lightweight, good general quality
text-embedding-3-small (OpenAI) — 1536 dimensions, commercial API
Cohere Embed v3 — multilingual, optimized for retrieval

2. Indexing

Document embeddings are stored in a structure that enables efficient similarity search. Options range from a simple in-memory array to specialized vector databases.

3. Querying

The user's query is converted into an embedding using the same model, and cosine similarity is calculated against all indexed documents. Results are ranked by similarity.

query → embedding → cosine similarity → ranked results

Implementation on this site

The first attempt (client-side)

The first semantic search implementation on this site used Transformers.js with the Xenova/all-MiniLM-L6-v2 model running directly in the browser via WebAssembly. Embeddings were pre-computed at build time and passed as props to the search page.

Why it was removed. Four issues made it unviable in production:

~30 MB download on first search — the ONNX model downloaded silently while the user saw "Searching..." with no results. For a personal site, this is unacceptable.
Tensor API inconsistencies — Transformers.js v3 returns tensors with different formats between Node.js and the browser WASM runtime. tolist(), .data, and output[0] all behaved differently depending on context, causing silent failures.
Vercel deployment issues — the WASM runtime had compatibility problems in production that didn't reproduce locally.
Complexity vs. value — with few articles, semantic search offers no significant advantage over keyword matching.

The decision and technical details are documented in issue #9 of the repository.

Current state

The current search is keyword-based: instant matching against titles, summaries, and tags. Zero dependencies, works synchronously, no loading state needed.

Pre-computed embeddings are still generated at build time (scripts/generate-embeddings.ts) and stored in public/embeddings.json — ready for future use.

Production approaches

To implement semantic search robustly, there are several options depending on scale:

Server-side with local model

Run the embedding model in a Node.js API route (not in the browser). The model loads once at server startup and processes queries in milliseconds. Viable for sites with moderate traffic.

Vector database

For large collections, a specialized vector database handles efficient indexing and search:

Pinecone — managed service, auto-scaling
Weaviate — open source, supports hybrid search
pgvector — PostgreSQL extension, ideal if already using Postgres
Qdrant — open source, high performance

Embeddings API + in-memory search

For static sites with a few dozen documents, pre-computing embeddings at build time and doing cosine search on the client is viable — as long as the model isn't downloaded in the browser. Pre-computed embeddings weigh kilobytes, not megabytes.

Hybrid search

Combining keyword search and semantic search typically yields better results than either alone. The typical approach:

Run both searches in parallel
Normalize scores from each
Combine with a configurable weight (e.g., 0.7 semantic + 0.3 keyword)

Benefits

Synonym and paraphrase understanding — finds relevant results even when they don't share exact words with the query
Multilingual search — with multilingual models, a Spanish query can find English documents and vice versa
Typo tolerance — embeddings capture meaning, not exact spelling
Related content discovery — vector similarity enables suggesting related articles without manual configuration
Scalability — works equally well with 10 documents or 10 million (with appropriate infrastructure)

When it's worth it

Semantic search adds real value when:

The corpus has more than ~20 documents
Users search with natural language, not exact terms
Content is multilingual
Related content discovery is needed

For small collections with predictable vocabulary, keyword search is simpler, faster, and effective enough.

Why it matters

Keyword search fails when the user doesn't know the exact terminology. Semantic search closes that gap by understanding the intent behind the query, making it essential for knowledge bases, technical documentation, and any system where the discovery experience matters.

References

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks — Reimers and Gurevych, 2019. The paper that popularized sentence embeddings for semantic search.
Transformers.js — Hugging Face. Library for running Transformer models in JavaScript/WASM.
all-MiniLM-L6-v2 — Sentence Transformers. Lightweight 384-dimension model for sentence embeddings.
pgvector — Andrew Kane. PostgreSQL extension for vector search.
Issue #9: Re-evaluate semantic search implementation — jonmatum.com. Documentation of the decision to remove client-side semantic search.

What it is

How it works

The process has three phases:

1. Embedding generation

A language model transforms text into fixed-dimension vectors. Each vector captures the semantic meaning of the text in a space where similar texts end up close together.

Common models for this include:

all-MiniLM-L6-v2 — 384 dimensions, lightweight, good general quality
text-embedding-3-small (OpenAI) — 1536 dimensions, commercial API
Cohere Embed v3 — multilingual, optimized for retrieval

2. Indexing

Document embeddings are stored in a structure that enables efficient similarity search. Options range from a simple in-memory array to specialized vector databases.

3. Querying

The user's query is converted into an embedding using the same model, and cosine similarity is calculated against all indexed documents. Results are ranked by similarity.

query → embedding → cosine similarity → ranked results

Implementation on this site

The first attempt (client-side)

Why it was removed. Four issues made it unviable in production:

~30 MB download on first search — the ONNX model downloaded silently while the user saw "Searching..." with no results. For a personal site, this is unacceptable.
Tensor API inconsistencies — Transformers.js v3 returns tensors with different formats between Node.js and the browser WASM runtime. tolist(), .data, and output[0] all behaved differently depending on context, causing silent failures.
Vercel deployment issues — the WASM runtime had compatibility problems in production that didn't reproduce locally.
Complexity vs. value — with few articles, semantic search offers no significant advantage over keyword matching.

The decision and technical details are documented in issue #9 of the repository.

Current state

The current search is keyword-based: instant matching against titles, summaries, and tags. Zero dependencies, works synchronously, no loading state needed.

Pre-computed embeddings are still generated at build time (scripts/generate-embeddings.ts) and stored in public/embeddings.json — ready for future use.

Production approaches

To implement semantic search robustly, there are several options depending on scale:

Server-side with local model

Run the embedding model in a Node.js API route (not in the browser). The model loads once at server startup and processes queries in milliseconds. Viable for sites with moderate traffic.

Vector database

For large collections, a specialized vector database handles efficient indexing and search:

Pinecone — managed service, auto-scaling
Weaviate — open source, supports hybrid search
pgvector — PostgreSQL extension, ideal if already using Postgres
Qdrant — open source, high performance

Embeddings API + in-memory search

Hybrid search

Combining keyword search and semantic search typically yields better results than either alone. The typical approach:

Run both searches in parallel
Normalize scores from each
Combine with a configurable weight (e.g., 0.7 semantic + 0.3 keyword)

Benefits

Synonym and paraphrase understanding — finds relevant results even when they don't share exact words with the query
Multilingual search — with multilingual models, a Spanish query can find English documents and vice versa
Typo tolerance — embeddings capture meaning, not exact spelling
Related content discovery — vector similarity enables suggesting related articles without manual configuration
Scalability — works equally well with 10 documents or 10 million (with appropriate infrastructure)

When it's worth it

Semantic search adds real value when:

The corpus has more than ~20 documents
Users search with natural language, not exact terms
Content is multilingual
Related content discovery is needed

For small collections with predictable vocabulary, keyword search is simpler, faster, and effective enough.

Why it matters

References

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks — Reimers and Gurevych, 2019. The paper that popularized sentence embeddings for semantic search.
Transformers.js — Hugging Face. Library for running Transformer models in JavaScript/WASM.
all-MiniLM-L6-v2 — Sentence Transformers. Lightweight 384-dimension model for sentence embeddings.
pgvector — Andrew Kane. PostgreSQL extension for vector search.
Issue #9: Re-evaluate semantic search implementation — jonmatum.com. Documentation of the decision to remove client-side semantic search.

What it is

How it works

1. Embedding generation

2. Indexing

3. Querying

Implementation on this site

The first attempt (client-side)

Current state

Production approaches

Server-side with local model

Vector database

Embeddings API + in-memory search

Hybrid search

Benefits

When it's worth it

Why it matters

References

Related content

What it is

How it works

1. Embedding generation

2. Indexing

3. Querying

Implementation on this site

The first attempt (client-side)

Current state

Production approaches

Server-side with local model

Vector database

Embeddings API + in-memory search

Hybrid search

Benefits

When it's worth it

Why it matters

References

Related content