BM25 & TF-IDF: Why Old SEO Still Powers AI

Every few months, someone declares the death of something fundamental.

Keywords are dead.
SEO is dead.
Websites are dead.
Apparently retrieval is dead too.

Now we have AI Overviews, AI Mode, Gemini, ChatGPT search, and enough AI branding to make old-fashioned search sound quaint. So it is tempting to assume that BM25 and TF-IDF are relics. Useful once, now irrelevant.

They are not.

AI search changes the interface. It does not remove the need to retrieve relevant documents quickly, consistently, and at scale. Before a system can summarise, rank, compare, quote, or cite anything, it has to find candidate sources in the first place. That is still a retrieval problem. And retrieval is exactly where TF-IDF and BM25 still matter.

Not because search has failed to modernise.

Because the fundamentals never went away.

The point most AI search commentary skips

A conversational interface makes search look different. It does not make document retrieval optional.

When someone asks a question in Google AI Overviews, AI Mode, or ChatGPT search, the system still has to assemble a set of possible sources before anything more sophisticated happens. That means finding documents that are plausibly relevant, doing so fast enough to be usable, and doing so reliably enough that the rest of the pipeline has something worth working with.

If retrieval is weak, the rest of the system starts from a bad shortlist.

And once that shortlist is wrong, everything built on top of it becomes less trustworthy. Ranking. Summarisation. Citations. All of it.

That is why this still matters to SEOs, publishers, and businesses. If your page is not retrieved, it is not in the running. If it is not in the running, it will not be selected. If it is not selected, it will not be cited.

Simple. Slightly unfashionable. Still true.

TF-IDF: the old idea that never stopped being useful

TF-IDF comes from classic information retrieval rather than the usual SEO mythology machine.

The basic logic is straightforward. A term that appears often in a document may be important. A term that appears in almost every document is less useful for distinguishing one document from another. So the weighting of a term depends not just on how often it appears on the page, but also on how distinctive it is across the wider collection.

There is nothing glamorous about this. It does not pretend to understand nuance, sentiment, or human motivation. It simply helps a system judge whether a document is likely to be about the thing being searched for.

That sounds modest because it is modest.

It is also extremely useful.

BM25: retrieval with better judgement

BM25 is what happened when TF-IDF grew up and became more practical.

It keeps the same general idea but handles it with more restraint. Repeating a term again and again should not increase relevance forever, so BM25 accounts for term saturation. Long documents also contain more words by default, so BM25 normalises for document length instead of rewarding size for its own sake.

That makes it a much better fit for real-world search systems.

Not magical. Not sexy. Just better behaved.

And that matters because search systems do not run on cleverness alone. They run on methods that are fast, robust, and dependable under pressure.

Why BM25 still matters in AI search

The mistake people make is assuming that semantic systems replaced lexical retrieval.

They did not.

What happened instead is more practical. Modern systems layered additional methods on top of lexical retrieval. They did not throw it out and light a candle for embeddings.

This is especially obvious in AI-driven search, where systems often need to retrieve candidate documents across multiple interpretations, subtopics, and reformulations of a question. That retrieval stage still benefits from methods that are good at matching explicit wording, rare terminology, exact phrases, product names, error messages, codes, legal wording, and all the other things real users search for when they are not behaving like tidy benchmark prompts.

Semantic matching is useful. Sometimes very useful.

It is also not enough on its own.

Users do not only search for “concepts related to this general theme”. Quite often they want the exact clause, the exact model number, the exact acronym, or the exact wording they half remember from a support document they saw three months ago.

That is where lexical retrieval remains extremely good.

AI search still starts with retrieval

The broad shape of the pipeline has not disappeared just because the surface looks more conversational.

First, a system has to fetch candidate documents from an index.

Then it can apply heavier ranking logic, semantic interpretation, reranking, contextual signals, and answer generation.

Then, if appropriate, it can generate a summary and attach links or citations.

That first step is not glamorous, but it is decisive.

If the initial candidate set is weak, the answer stage has less to work with. A model cannot cite a page it never considered. It cannot synthesise a source it failed to retrieve.

This is one reason so much AI search commentary goes wrong. It focuses on the final generated answer and treats retrieval as plumbing.

Plumbing matters when the building depends on it.

Here’s the simplified version most non-SEOs never see:

Diagram showing an AI search pipeline with lexical retrieval using BM25, vector retrieval, reranking, merged results, and answer generation with cited sources.

Simpified visuliasation of the role of BM25 in AI search

Hybrid retrieval is the grown-up answer

In practice, sensible systems do not choose between lexical retrieval and vector retrieval as though this were a theological dispute.

They combine them.

That is what hybrid retrieval is for. Lexical methods such as BM25 help with precise wording and exact matches. Vector methods help with semantic similarity and broader conceptual matching. The results can then be merged, reranked, and passed into later stages of the pipeline.

That is not a fringe idea. It is just the grown-up answer to a problem people keep trying to oversimplify.

In other words, the sensible answer was not “replace BM25”.

It was “stop expecting one method to do everything”.

Exact matching still matters more than people admit

A lot of AI search commentary leans too heavily on the idea of “understanding meaning”.

Fair enough. Meaning matters.

But exact matching still matters because users are often more precise than interface trends would suggest.

They search for model numbers.
Error strings.
Legislation references.
Medical terms.
Contract wording.
Niche jargon.
Abbreviations that make no sense outside one industry.

In those cases, lexical retrieval is not a quaint leftover. It is often the fastest and most reliable way to surface the right candidates.

There is a reason production systems keep it close.

Not because the industry is nostalgic.

Because exact intent did not vanish when AI became easier to demo.

BM25 is useful because it is practical

Another unfashionable truth: BM25 is not just effective, it is operationally convenient.

It is fast. It is relatively cheap. It is inspectable. You can reason about why a document matched. You can debug it. You can explain it to another adult without resorting to mystical language.

That matters more than the average AI keynote would like to admit.

There is a place for dense models, embeddings, semantic rerankers, and all the rest. But when something goes wrong, organisations still need systems they can inspect and tune. They need infrastructure that behaves like engineering, not divination.

“The vectors thought it felt right” is not a serious diagnostic framework.

What this means for SEO

This is the part some people try very hard to overcomplicate.

If AI systems still depend on retrieval, then many familiar SEO fundamentals still matter because they improve retrieval quality.

Not because Google is sentimental.
Not because BM25 is holy.
Because systems still need clear, accessible, indexable content to work with.

If important information is buried in images, trapped in awkward JavaScript rendering, buried in PDFs, or written so vaguely that the key terms never appear, retrieval gets harder.

If your internal linking is weak, your terminology is inconsistent, and your pages are ambiguous about who you are, what you do, and where you are relevant, retrieval gets harder.

If your most useful answer is hidden behind fluff, vague headings, or clever branding language that nobody actually searches for, retrieval gets harder.

And when retrieval gets harder, citation and visibility get harder too.

There is no secret AI search trick here.

There is just a higher premium on clarity.

If you want a clearer picture of where your site is failing in AI search, citation visibility, or retrieval clarity, this is exactly the sort of problem I look at in your LLM Visibility Audit. The useful question is not whether you can “rank in ChatGPT”. It is whether your content is structurally easy to retrieve, interpret, and trust.

What better content looks like in this environment

The practical implications are not exotic.

Important information should be available in text, not trapped in decorative formats.

Pages should use the terms real users and real buyers actually use, including the ugly, specific, commercially awkward ones.

Entities should be clear. Who is this page about. What is being offered. Where does it apply. How does it relate to other relevant things.

Answers should be extractable. Not robotic. Not stripped of nuance. Just clear enough that retrieval systems and downstream answer systems can identify what the page is useful for.

This also means thinking beyond the obvious primary query. In AI search, systems may break a question into supporting subtopics and retrieve content for those smaller parts. So a page does not necessarily need to answer the entire headline question to earn visibility. Sometimes it earns selection because it explains one useful component especially well.

That is another reason clarity beats performance.

A page can only help if a system can tell what it is helping with.

The real takeaway

BM25 and TF-IDF still matter because AI search still depends on retrieval.

The interface has changed. The need has not.

AI Overviews, AI Mode, and ChatGPT search may look like answer systems rather than search systems. In practice they still rely on finding useful source material before anything else can happen. That is why lexical retrieval remains relevant. Not as the whole system, and not as the final judge, but as part of the machinery that decides what gets considered at all.

So no, BM25 has not been made obsolete by AI search.

It has become easier to ignore in conversation.

That is not the same thing.

If this is becoming a wider problem across your site rather than a single-page issue, that usually points to a strategy problem rather than a metadata problem. That is the sort of work I cover in Strategic SEO Consulting: fixing how a business is understood, retrieved, and represented across search, not just polishing pages in isolation.

Why BM25 Still Matters in SEO and the AI Search