Vector Search vs Semantic Search: They're Not the Same Thing
Vector search, semantic search, keyword search, hybrid search — these terms get used interchangeably but they mean different things. This post breaks down what each actually does, when each matters, and why hybrid search wins for RAG.
“We need semantic search” has become the default request for any project involving GenAI. But when you dig into what people actually mean, you find four different concepts being used interchangeably: keyword search, vector search, semantic search, and hybrid search. They’re related, but they’re not the same thing — and picking the wrong one for your RAG system means either missing relevant results or burning budget on unnecessary complexity.
The Problem
The terminology is a mess. Vendor marketing doesn’t help — every database with a vector column now claims to offer “semantic search.” The confusion leads to two common mistakes:
-
Teams implement pure vector search and call it semantic search. They embed documents, run k-NN similarity, and wonder why searching for “error code E-4012” returns generic error handling docs instead of the specific error definition.
-
Teams skip keyword search entirely because it feels old. BM25 is a 30-year-old algorithm, so it must be obsolete now that we have embeddings. Except it consistently outperforms vector search for exact-match queries — and in most enterprise datasets, users search for specific codes, IDs, and terms more often than vague concepts.
The result: RAG systems that give impressive demos but fail on real queries.
The Solution
There are four distinct search approaches, each building on the previous one. Understanding what each does — and what it misses — is the key to picking the right one.
The punchline: for most RAG workloads, you want hybrid search — keyword and vector search running in parallel with score fusion. It’s the only approach that catches both exact matches and meaning-based matches. Here’s why.
How It Works
Keyword Search (BM25)
BM25 is the algorithm behind keyword search in OpenSearch, Elasticsearch, and most search engines built in the last three decades. It scores documents based on three things:
- Term Frequency (TF) — How often does the search term appear in the chunk? More occurrences score higher, but with diminishing returns — 10 mentions isn’t 10x better than 1.
- Inverse Document Frequency (IDF) — Is the term rare across all chunks? Rare terms score higher. “E-4012” scores much higher than “the.”
- Document Length — Short chunks with the term score higher than long chunks with the same term. The term is more concentrated.
Query: "error code E-4012"
Chunk A (200 words): "...error code E-4012 occurs when the connection pool..."
-> High score: exact terms present, short chunk, "E-4012" is rare (high IDF)
Chunk B (2000 words): "...various error codes include E-1001, E-2003, E-4012..."
-> Lower score: term present but chunk is long, appears once among many
Chunk C (200 words): "...the application crashes due to timeout issues..."
-> Zero score: none of the query terms appear
BM25 uses an inverted index — a pre-built lookup table mapping every term to the documents containing it. This makes keyword search extremely fast. No ML model, no GPU, no embedding. Just a dictionary lookup with scoring.
Catches: Exact terms, codes, IDs, product names, error codes, specific phrases.
Misses: Synonyms, paraphrases, conceptual similarity. “App keeps crashing” won’t find “system instability due to resource exhaustion” because they share no words.
Vector Search (k-NN)
Vector search converts text into a mathematical representation (a vector of floats) using an embedding model. Texts with similar meaning end up as nearby points in a high-dimensional space. At query time, you convert the question into a vector and find the k nearest neighbors.
Embedding model converts:
"application crashes intermittently" -> [0.023, -0.841, 0.112, ...]
"system experiences sporadic failures" -> [0.019, -0.830, 0.098, ...]
"error code E-4012" -> [0.445, 0.221, -0.667, ...]
The first two are close together (similar meaning).
The third is far away (unrelated meaning).
The k-NN search finds the closest vectors using distance metrics — cosine similarity, L2 (Euclidean), or inner product. On OpenSearch, you can choose which library performs this search:
| Engine | How It Works | Trade-off |
|---|---|---|
| FAISS | In-memory graph (HNSW) or inverted file (IVF) | Fastest, but needs RAM for vectors |
| Lucene | Disk-based HNSW with segment caching | Slower, but much cheaper (vectors on disk) |
| NMSLIB | In-memory HNSW | Best recall, but no filtering during search |
All three are free, open-source libraries bundled into OpenSearch. The engine choice affects cost through infrastructure sizing, not licensing. For a deeper dive on engine selection, see the RAG chunking and testing guide.
Catches: Meaning, intent, conceptual similarity. “App crashes” finds “system instability.”
Misses: Specific identifiers. “E-4012” is just a string to the embedding model — it has no semantic meaning. The vector for “E-4012” might be near “E-4013” or “error code” generically, but not specifically near the chunk that explains what E-4012 is.
Semantic Search
Semantic search is vector search plus additional intelligence layers. The term is often used loosely, but a proper semantic search system adds:
- Query understanding — Expanding, reformulating, or enriching the query before embedding. “Lambda cold start” might be expanded to include “initialization latency” and “function startup time.”
- Reranking — A cross-encoder model that takes each (query, result) pair and scores them together. Unlike embeddings which encode query and document independently, rerankers see both at once and produce much better relevance scores.
- Context awareness — Using conversation history, user profile, or domain context to adjust results.
Pure vector search:
Query -> Embed -> k-NN -> Top 5 results
Semantic search:
Query -> Expand/Reformulate -> Embed -> k-NN -> Top 20 candidates
-> Rerank (cross-encoder scores each pair) -> Top 5 results
Reranking alone typically improves retrieval quality by 5-15% over pure vector search. On AWS, two rerankers are available:
| Reranker | Pricing | Notes |
|---|---|---|
| Amazon Rerank 1.0 | Included | Not available in us-east-1 |
| Cohere Rerank 3.5 | $2.00/1K queries | Available in more regions |
Catches: Everything vector search catches, but with better ranking. Fewer irrelevant results in the top positions.
Misses: Still misses exact codes and identifiers — it’s still fundamentally based on meaning, not terms.
Hybrid Search
Hybrid search runs keyword (BM25) and vector (k-NN) in parallel on the same query, then combines the scores. This is the only approach that catches both exact matches and semantic matches.
Query: "Why does error E-4012 cause the app to crash?"
| |
v v
BM25 (keyword) k-NN (vector)
| |
Finds: "E-4012 is a Finds: "application crashes
DB connection pool due to connection pool
timeout error" exhaustion and retry
failures"
| |
v v
Score Fusion (combine & rank)
|
v
Both chunks go to LLM
The LLM now has what E-4012 is (from keyword) and how to fix the crash (from vector). Pure vector search would have missed the E-4012 definition. Pure keyword search would have missed the crash remediation.
On AWS, OpenSearch is the only native service with built-in hybrid search — BM25 and k-NN run in a single query. If you’re using another vector store (Aurora pgvector, S3 Vectors, MemoryDB), you’d need to run keyword and vector searches separately and merge results yourself.
# OpenSearch hybrid query — single request, both engines
hybrid_query = {
"size": 5,
"query": {
"hybrid": {
"queries": [
{
"match": {
"content": "error E-4012 application crash"
}
},
{
"knn": {
"embedding": {
"vector": query_embedding,
"k": 5
}
}
}
]
}
}
}
Catches: Both exact terms and semantic meaning. The most complete retrieval approach.
Misses: Very little. The main trade-off is cost and complexity — you need a search engine that supports both BM25 and k-NN (OpenSearch), and your index stores both text fields and vector fields.
The Comparison Table
| Keyword (BM25) | Vector (k-NN) | Semantic | Hybrid | |
|---|---|---|---|---|
| Matches on | Exact words | Meaning | Meaning + ranking | Words + meaning |
| ”error E-4012” | Finds it | Likely misses | Likely misses | Finds it |
| ”app keeps crashing” | Misses synonyms | Finds them | Finds + ranks them | Finds them |
| Needs ML model | No | Embedding model | Embedding + reranker | Embedding model |
| Speed | Fastest | Engine-dependent | Slower (reranking) | Both run in parallel |
| Index storage | Text (inverted index) | Vectors (RAM or disk) | Vectors + reranker | Text + vectors |
| AWS service | Any OpenSearch | Any with k-NN | Bedrock KB + reranker | OpenSearch only |
Where Each Approach Shines
Use keyword search alone when:
- Users search for specific identifiers, codes, or exact phrases
- Your data is structured (logs, tickets, records with known fields)
- You need maximum speed with zero ML infrastructure
Use vector search alone when:
- Queries are conversational (“how do I fix this?”)
- Documents are conceptual (whitepapers, guides, Q&A)
- Budget is constrained and you’re using S3 Vectors or Aurora pgvector
- Users never search for specific codes or IDs
Use semantic search when:
- You’re already doing vector search and want better ranking
- The top-5 results matter more than the top-20 (reranking improves precision at the top)
- Budget allows for a reranking step
Use hybrid search when:
- Your data contains both specific identifiers and conceptual content (most enterprise data)
- Retrieval quality directly impacts business outcomes
- You’re building a RAG system for IT support, legal, manufacturing, healthcare, or finance — any domain with codes, IDs, and natural language mixed together
Domain Impact
| Domain | Users search for… | Keyword catches | Vector catches | Need hybrid? |
|---|---|---|---|---|
| IT support | Error codes, ticket IDs, service names + symptoms | Codes, IDs | Symptoms, troubleshooting | Yes |
| Legal | Article numbers, case references + legal concepts | Statute IDs | Interpretations | Yes |
| Manufacturing | Part numbers, machine IDs + failure descriptions | Part codes | Failure modes | Yes |
| Healthcare | Drug codes, ICD codes + symptom descriptions | Medical codes | Symptoms, treatments | Yes |
| General Q&A | Mostly “how do I…” questions | Limited value | High value | Optional |
If your domain has specific identifiers that users search for, hybrid search isn’t optional — it’s required.
Cost Reality
Hybrid search means OpenSearch, and OpenSearch means either managed clusters or Serverless:
| Option | Minimum Cost | Hybrid Search |
|---|---|---|
| OpenSearch Managed | ~$470/month (small cluster) | Yes |
| OpenSearch Serverless | ~$700/month (4 OCUs min) | Yes |
| S3 Vectors | $0 minimum (pay-per-query) | No |
| Aurora pgvector | ~$60/month (small instance) | No (vector only) |
If your workload is low-volume and purely semantic (no codes or IDs), S3 Vectors or Aurora pgvector save significant cost. If you need hybrid search, OpenSearch Managed gives you the lowest entry point. For a detailed comparison of all vector store options on AWS, see the vector store guide.
What I Learned
- Vector search is a mechanism, semantic search is a capability — Vector search (k-NN) is the algorithm that finds nearest neighbors. Semantic search is the broader system that uses vector search plus query understanding, reranking, and context. Calling k-NN “semantic search” is like calling a database query “business intelligence.”
- BM25 is 30 years old and still essential — Every benchmark shows that hybrid search (BM25 + vector) outperforms pure vector search by 10-20%. Old doesn’t mean obsolete. Exact-match retrieval solves problems that embeddings fundamentally cannot.
- Hybrid search is the right default for enterprise RAG — If your documents contain any codes, IDs, product names, or specific terms, hybrid search is not a nice-to-have. It’s the difference between finding “E-4012 is a timeout error” and returning generic error handling documentation.
- The cost of hybrid search is the cost of OpenSearch — There’s no free hybrid search option on AWS today. This is the real trade-off: ~$470+/month for better retrieval quality, or $0-60/month with pure vector search. For production RAG systems, the retrieval quality usually justifies the cost.
What’s Next
- Benchmark hybrid search vs pure vector search on a real enterprise dataset — quantify the 10-20% improvement claim with actual recall@5 and faithfulness scores
- Test Bedrock Knowledge Bases with OpenSearch Serverless backend in hybrid mode vs S3 Vectors backend — same queries, same documents, measure quality delta
- Explore OpenSearch neural search plugin for query expansion — automatic synonym and concept injection before retrieval
- Build a cost model: at what query volume does hybrid search on OpenSearch become cheaper per-query than vector-only on S3 Vectors + reranking?
References:
- OpenSearch Hybrid Search
- OpenSearch k-NN Plugin
- Amazon Bedrock Reranking
- BM25 — The Original Paper (Robertson & Zaragoza)
- Building a RAG System That Actually Works — Companion post on chunking strategies, vector engines, and testing
- RAG on AWS: Which Vector Store Is Right for You? — Full comparison of all 9 AWS vector storage options
Related Posts
Building a RAG System That Actually Works: Chunking, Vector Engines, and Testing
Most RAG tutorials stop at 'put vectors in a database.' This post covers what actually determines quality: how you chunk documents, which vector search engine to pick, and how to measure and iterate on retrieval performance using Bedrock Knowledge Bases and LLM-as-judge evaluation.
AIRAG on AWS: Which Vector Store Is Right for You?
AWS now offers 9 different ways to store and search vectors for RAG workloads. This guide compares every option through the Well-Architected Framework to help you pick the right one.
AIOpenClaw vs NanoBot vs PicoClaw vs TinyClaw: Four Approaches to Self-Hosted AI Assistants
A deep architectural comparison of four open-source frameworks that turn messaging apps into AI assistant interfaces — from a 349-file TypeScript monolith to a 10MB Go binary that runs on a $10 board.
