Vector Search vs Semantic Search: They're Not the Same Thing

Vector search, semantic search, keyword search, hybrid search — these terms get used interchangeably but they mean different things. This post breaks down what each actually does, when each matters, and why hybrid search wins for RAG.

Alexandre Agius

AWS Solutions Architect

March 10, 2026 12 min read

AWS OpenSearch Vector Search Semantic Search RAG BM25 GenAI Bedrock

Table of Contents

The Problem
The Solution
How It Works
Keyword Search (BM25)
Vector Search (k-NN)
Semantic Search
Hybrid Search
The Comparison Table
Where Each Approach Shines
Domain Impact
Cost Reality
What I Learned
What’s Next

“We need semantic search” has become the default request for any project involving GenAI. But when you dig into what people actually mean, you find four different concepts being used interchangeably: keyword search, vector search, semantic search, and hybrid search. They’re related, but they’re not the same thing — and picking the wrong one for your RAG system means either missing relevant results or burning budget on unnecessary complexity.

The Problem

The terminology is a mess. Vendor marketing doesn’t help — every database with a vector column now claims to offer “semantic search.” The confusion leads to two common mistakes:

Teams implement pure vector search and call it semantic search. They embed documents, run k-NN similarity, and wonder why searching for “error code E-4012” returns generic error handling docs instead of the specific error definition.
Teams skip keyword search entirely because it feels old. BM25 is a 30-year-old algorithm, so it must be obsolete now that we have embeddings. Except it consistently outperforms vector search for exact-match queries — and in most enterprise datasets, users search for specific codes, IDs, and terms more often than vague concepts.

The result: RAG systems that give impressive demos but fail on real queries.

The Solution

There are four distinct search approaches, each building on the previous one. Understanding what each does — and what it misses — is the key to picking the right one.

The punchline: for most RAG workloads, you want hybrid search — keyword and vector search running in parallel with score fusion. It’s the only approach that catches both exact matches and meaning-based matches. Here’s why.

How It Works

Keyword Search (BM25)

BM25 is the algorithm behind keyword search in OpenSearch, Elasticsearch, and most search engines built in the last three decades. It scores documents based on three things:

Term Frequency (TF) — How often does the search term appear in the chunk? More occurrences score higher, but with diminishing returns — 10 mentions isn’t 10x better than 1.
Inverse Document Frequency (IDF) — Is the term rare across all chunks? Rare terms score higher. “E-4012” scores much higher than “the.”
Document Length — Short chunks with the term score higher than long chunks with the same term. The term is more concentrated.

Query: "error code E-4012"

Chunk A (200 words): "...error code E-4012 occurs when the connection pool..."
  -> High score: exact terms present, short chunk, "E-4012" is rare (high IDF)

Chunk B (2000 words): "...various error codes include E-1001, E-2003, E-4012..."
  -> Lower score: term present but chunk is long, appears once among many

Chunk C (200 words): "...the application crashes due to timeout issues..."
  -> Zero score: none of the query terms appear

BM25 uses an inverted index — a pre-built lookup table mapping every term to the documents containing it. This makes keyword search extremely fast. No ML model, no GPU, no embedding. Just a dictionary lookup with scoring.

Catches: Exact terms, codes, IDs, product names, error codes, specific phrases.

Misses: Synonyms, paraphrases, conceptual similarity. “App keeps crashing” won’t find “system instability due to resource exhaustion” because they share no words.

Vector Search (k-NN)

Vector search converts text into a mathematical representation (a vector of floats) using an embedding model. Texts with similar meaning end up as nearby points in a high-dimensional space. At query time, you convert the question into a vector and find the k nearest neighbors.

Embedding model converts:
  "application crashes intermittently"  -> [0.023, -0.841, 0.112, ...]
  "system experiences sporadic failures" -> [0.019, -0.830, 0.098, ...]
  "error code E-4012"                   -> [0.445, 0.221, -0.667, ...]

The first two are close together (similar meaning).
The third is far away (unrelated meaning).

The k-NN search finds the closest vectors using distance metrics — cosine similarity, L2 (Euclidean), or inner product. On OpenSearch, you can choose which library performs this search:

Engine	How It Works	Trade-off
FAISS	In-memory graph (HNSW) or inverted file (IVF)	Fastest, but needs RAM for vectors
Lucene	Disk-based HNSW with segment caching	Slower, but much cheaper (vectors on disk)
NMSLIB	In-memory HNSW	Best recall, but no filtering during search

All three are free, open-source libraries bundled into OpenSearch. The engine choice affects cost through infrastructure sizing, not licensing. For a deeper dive on engine selection, see the RAG chunking and testing guide.

Catches: Meaning, intent, conceptual similarity. “App crashes” finds “system instability.”

Misses: Specific identifiers. “E-4012” is just a string to the embedding model — it has no semantic meaning. The vector for “E-4012” might be near “E-4013” or “error code” generically, but not specifically near the chunk that explains what E-4012 is.

Semantic Search

Semantic search is vector search plus additional intelligence layers. The term is often used loosely, but a proper semantic search system adds:

Query understanding — Expanding, reformulating, or enriching the query before embedding. “Lambda cold start” might be expanded to include “initialization latency” and “function startup time.”
Reranking — A cross-encoder model that takes each (query, result) pair and scores them together. Unlike embeddings which encode query and document independently, rerankers see both at once and produce much better relevance scores.
Context awareness — Using conversation history, user profile, or domain context to adjust results.

Pure vector search:
  Query -> Embed -> k-NN -> Top 5 results

Semantic search:
  Query -> Expand/Reformulate -> Embed -> k-NN -> Top 20 candidates
       -> Rerank (cross-encoder scores each pair) -> Top 5 results

Reranking alone typically improves retrieval quality by 5-15% over pure vector search. On AWS, two rerankers are available:

Reranker	Pricing	Notes
Amazon Rerank 1.0	Included	Not available in us-east-1
Cohere Rerank 3.5	$2.00/1K queries	Available in more regions

Catches: Everything vector search catches, but with better ranking. Fewer irrelevant results in the top positions.

Misses: Still misses exact codes and identifiers — it’s still fundamentally based on meaning, not terms.

Hybrid Search

Hybrid search runs keyword (BM25) and vector (k-NN) in parallel on the same query, then combines the scores. This is the only approach that catches both exact matches and semantic matches.

Query: "Why does error E-4012 cause the app to crash?"
          |                              |
          v                              v
    BM25 (keyword)                 k-NN (vector)
          |                              |
    Finds: "E-4012 is a              Finds: "application crashes
    DB connection pool               due to connection pool
    timeout error"                   exhaustion and retry
                                     failures"
          |                              |
          v                              v
         Score Fusion (combine & rank)
                      |
                      v
              Both chunks go to LLM

The LLM now has what E-4012 is (from keyword) and how to fix the crash (from vector). Pure vector search would have missed the E-4012 definition. Pure keyword search would have missed the crash remediation.

On AWS, OpenSearch is the only native service with built-in hybrid search — BM25 and k-NN run in a single query. If you’re using another vector store (Aurora pgvector, S3 Vectors, MemoryDB), you’d need to run keyword and vector searches separately and merge results yourself.

# OpenSearch hybrid query — single request, both engines
hybrid_query = {
    "size": 5,
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "content": "error E-4012 application crash"
                    }
                },
                {
                    "knn": {
                        "embedding": {
                            "vector": query_embedding,
                            "k": 5
                        }
                    }
                }
            ]
        }
    }
}

Catches: Both exact terms and semantic meaning. The most complete retrieval approach.

Misses: Very little. The main trade-off is cost and complexity — you need a search engine that supports both BM25 and k-NN (OpenSearch), and your index stores both text fields and vector fields.

The Comparison Table

	Keyword (BM25)	Vector (k-NN)	Semantic	Hybrid
Matches on	Exact words	Meaning	Meaning + ranking	Words + meaning
”error E-4012”	Finds it	Likely misses	Likely misses	Finds it
”app keeps crashing”	Misses synonyms	Finds them	Finds + ranks them	Finds them
Needs ML model	No	Embedding model	Embedding + reranker	Embedding model
Speed	Fastest	Engine-dependent	Slower (reranking)	Both run in parallel
Index storage	Text (inverted index)	Vectors (RAM or disk)	Vectors + reranker	Text + vectors
AWS service	Any OpenSearch	Any with k-NN	Bedrock KB + reranker	OpenSearch only

Where Each Approach Shines

Use keyword search alone when:

Users search for specific identifiers, codes, or exact phrases
Your data is structured (logs, tickets, records with known fields)
You need maximum speed with zero ML infrastructure

Use vector search alone when:

Queries are conversational (“how do I fix this?”)
Documents are conceptual (whitepapers, guides, Q&A)
Budget is constrained and you’re using S3 Vectors or Aurora pgvector
Users never search for specific codes or IDs

Use semantic search when:

You’re already doing vector search and want better ranking
The top-5 results matter more than the top-20 (reranking improves precision at the top)
Budget allows for a reranking step

Use hybrid search when:

Your data contains both specific identifiers and conceptual content (most enterprise data)
Retrieval quality directly impacts business outcomes
You’re building a RAG system for IT support, legal, manufacturing, healthcare, or finance — any domain with codes, IDs, and natural language mixed together

Domain Impact

Domain	Users search for…	Keyword catches	Vector catches	Need hybrid?
IT support	Error codes, ticket IDs, service names + symptoms	Codes, IDs	Symptoms, troubleshooting	Yes
Legal	Article numbers, case references + legal concepts	Statute IDs	Interpretations	Yes
Manufacturing	Part numbers, machine IDs + failure descriptions	Part codes	Failure modes	Yes
Healthcare	Drug codes, ICD codes + symptom descriptions	Medical codes	Symptoms, treatments	Yes
General Q&A	Mostly “how do I…” questions	Limited value	High value	Optional

If your domain has specific identifiers that users search for, hybrid search isn’t optional — it’s required.

Cost Reality

Hybrid search means OpenSearch, and OpenSearch means either managed clusters or Serverless:

Option	Minimum Cost	Hybrid Search
OpenSearch Managed	~$470/month (small cluster)	Yes
OpenSearch Serverless	~$700/month (4 OCUs min)	Yes
S3 Vectors	$0 minimum (pay-per-query)	No
Aurora pgvector	~$60/month (small instance)	No (vector only)

If your workload is low-volume and purely semantic (no codes or IDs), S3 Vectors or Aurora pgvector save significant cost. If you need hybrid search, OpenSearch Managed gives you the lowest entry point. For a detailed comparison of all vector store options on AWS, see the vector store guide.

What I Learned

Vector search is a mechanism, semantic search is a capability — Vector search (k-NN) is the algorithm that finds nearest neighbors. Semantic search is the broader system that uses vector search plus query understanding, reranking, and context. Calling k-NN “semantic search” is like calling a database query “business intelligence.”
BM25 is 30 years old and still essential — Every benchmark shows that hybrid search (BM25 + vector) outperforms pure vector search by 10-20%. Old doesn’t mean obsolete. Exact-match retrieval solves problems that embeddings fundamentally cannot.
Hybrid search is the right default for enterprise RAG — If your documents contain any codes, IDs, product names, or specific terms, hybrid search is not a nice-to-have. It’s the difference between finding “E-4012 is a timeout error” and returning generic error handling documentation.
The cost of hybrid search is the cost of OpenSearch — There’s no free hybrid search option on AWS today. This is the real trade-off: ~$470+/month for better retrieval quality, or $0-60/month with pure vector search. For production RAG systems, the retrieval quality usually justifies the cost.

What’s Next

Benchmark hybrid search vs pure vector search on a real enterprise dataset — quantify the 10-20% improvement claim with actual recall@5 and faithfulness scores
Test Bedrock Knowledge Bases with OpenSearch Serverless backend in hybrid mode vs S3 Vectors backend — same queries, same documents, measure quality delta
Explore OpenSearch neural search plugin for query expansion — automatic synonym and concept injection before retrieval
Build a cost model: at what query volume does hybrid search on OpenSearch become cheaper per-query than vector-only on S3 Vectors + reranking?

References:

OpenSearch Hybrid Search
OpenSearch k-NN Plugin
Amazon Bedrock Reranking
BM25 — The Original Paper (Robertson & Zaragoza)
Building a RAG System That Actually Works — Companion post on chunking strategies, vector engines, and testing
RAG on AWS: Which Vector Store Is Right for You? — Full comparison of all 9 AWS vector storage options

Alexandre Agius

AWS Solutions Architect

Passionate about AI & Security. Building scalable cloud solutions and helping organizations leverage AWS services to innovate faster. Specialized in Generative AI, serverless architectures, and security best practices.

LinkedIn GitHub

Building a RAG System That Actually Works: Chunking, Vector Engines, and Testing

Most RAG tutorials stop at 'put vectors in a database.' This post covers what actually determines quality: how you chunk documents, which vector search engine to pick, and how to measure and iterate on retrieval performance using Bedrock Knowledge Bases and LLM-as-judge evaluation.

Mar 10, 2026 AI

RAG on AWS: Which Vector Store Is Right for You?

AWS now offers 9 different ways to store and search vectors for RAG workloads. This guide compares every option through the Well-Architected Framework to help you pick the right one.

Feb 9, 2026 AI

OpenClaw vs NanoBot vs PicoClaw vs TinyClaw: Four Approaches to Self-Hosted AI Assistants

A deep architectural comparison of four open-source frameworks that turn messaging apps into AI assistant interfaces — from a 349-file TypeScript monolith to a 10MB Go binary that runs on a $10 board.

Mar 4, 2026

The Problem

The Solution

How It Works

Keyword Search (BM25)

Vector Search (k-NN)

Semantic Search

Hybrid Search

The Comparison Table

Where Each Approach Shines

Domain Impact

Cost Reality

What I Learned

What’s Next

Alexandre Agius

Related Posts

Building a RAG System That Actually Works: Chunking, Vector Engines, and Testing

RAG on AWS: Which Vector Store Is Right for You?

OpenClaw vs NanoBot vs PicoClaw vs TinyClaw: Four Approaches to Self-Hosted AI Assistants