TL;DR: Discover looks less like a single “ranking system” and more like a pipeline: your page gets packaged (metadata extracted), possibly filtered out early, then matched to user interests, then delivered via a stream that can reorder, test, and remove items. Practical consequence: if your structured data is messy, you can sabotage yourself before “ranking” even happens.
Disclaimer: This is a practical interpretation of Metehan Yeşilyurt’s reverse-engineering research into Google Discover architecture: https://metehan.ai/blog/google-discover-architecture/. I’m simplifying it for working SEOs and publishers; any blunt opinions (or mistakes) are mine.
Why Discover traffic drops are often upstream problems
Most people talk about Google Discover like it’s a mysterious ranking algorithm you can charm with a big image and a catchy headline.
The research behind this post suggests something more boring, and more useful: Discover behaves like a multi-stage pipeline. Pipelines fail upstream. And when they fail upstream, all your “optimisation” chat is just noise.
The mental model that actually helps
Think of Discover in four stages:
Packaging: the system extracts what your page is: title, author, publisher, images, plus classification signals.
Filtering: some things get excluded early (publisher/section level and/or item level) before the personalisation engine even gets involved.
Matching: content is aligned to user interests (entities/topics etc.).
Delivery: it’s a stream, not a static list, items can be inserted, reordered, withheld for experiments, or pulled back.
So when Discover drops, don’t jump straight to “ranking”. You might be looking at packaging issues, early filtering, or simple test-bucket churn.
The unglamorous bit: metadata precedence can hurt you
One practical implication: the extraction logic described suggests Schema.org JSON-LD may be evaluated before OG/Twitter/meta fallbacks for key fields like title/author/publisher.
If that’s true, it creates a very normal publisher failure mode:
Your OG tags are fine because someone cares.
Your structured data is a mess because nobody owns it.
Result: bad Schema can override good OG, and you feed incorrect identity signals into the pipeline. Not “suboptimal”. Wrong.
Dismissals aren’t engagement. They’re rejection.
Discover has explicit negative feedback: “not interested”, dismissals, “don’t show me this”. The behaviour described implies rejected items can effectively stay rejected for that user.
So yes, clicky headlines can win the click. They can also train the system that your output wastes the user’s time — which is a great way to shrink your future distribution.
Caveat (because SEOs love overclaiming)
This is based on what’s observable from app-side artefacts and telemetry, not a full server-side ranking blueprint. Treat it as a stronger diagnostic model, not a magical key.
Takeaways for SEOs and publishers
Treat structured data as a production dependency. If JSON-LD is wrong or inconsistent, you’re not “missing an SEO opportunity”, you’re breaking packaging.
When Discover drops, check upstream: templates, canonical/variants, author/publisher identity, metadata output, image rules, and paywall/consent scripts injecting junk.
Optimise for satisfaction, not just clicks. Negative feedback is cleaner and stickier than most people want to admit.
Expect experiment noise. Some volatility is simply testing and delivery churn — your reporting should be able to separate “we broke something” from “we’re in a bucket.”
Further reading: Metehan Yeşilyurt’s original research is here: https://metehan.ai/blog/google-discover-architecture/.
