How ChatGPT uses Reciprocal Rank Fusion for AI citations

Updated December 27, 2025

TL;DR: ChatGPT doesn't pick winners the way Google does. It uses Reciprocal Rank Fusion (RRF) to blend results from keyword search and semantic search into one ranked list. Your content doesn't need to be the #1 authority. It needs to appear consistently across both retrieval methods. If you rank well in keyword matching but poorly in semantic relevance (or vice versa), RRF drops you from the final context window. The fix: structure content using frameworks like CITABLE that satisfy both signals simultaneously, creating the "consensus" RRF rewards.

Blyska's analysis of 1 billion ChatGPT citations found Reddit citations rose 87% since late July 2025, while brand websites saw referrals drop. The reason isn't domain authority or backlinks. It's Reciprocal Rank Fusion.

RRF is the algorithm ChatGPT uses to blend keyword and semantic search results into the final context that generates answers. If your content ranks well in one method but poorly in the other, RRF drops you during the merge. This explains why brands with strong Google rankings often disappear from AI citations entirely.

This article breaks down how RRF works, why it changes the SEO playbook, and what to optimize for instead.

What is Reciprocal Rank Fusion (RRF)?

Reciprocal Rank Fusion is a method for combining multiple ranked lists into a single, unified ranking. Cormack, Clarke, and Buettcher developed it in 2009, and their original research showed it outperformed other fusion methods without requiring any tuning.

The Elasticsearch documentation defines it simply: RRF "requires no tuning, and the different relevance indicators do not have to be related to each other to achieve high-quality results."

The core principle:

Each document gets a score based on its rank position (1st place = higher score, 50th place = lower score). When a document appears in multiple search methods, RRF adds those scores together. Documents that rank consistently well across all methods beat documents that rank #1 in just one method.

The "Panel of Judges" analogy:

Think of RRF as a panel of expert advisors ranking candidates independently. If Judge A ranks you 1st but Judge B ranks you 50th, you lose. If Judge A ranks you 3rd and Judge B ranks you 4th, you win. RRF rewards consistency across judges, not perfection from one.

According to Microsoft's Azure documentation, "by prioritizing documents that consistently rank highly across different sources, RRF improves search relevance without relying on traditional score normalization techniques"

Key terminology

Term	Definition
Reciprocal Rank Fusion (RRF)	Algorithm that combines multiple ranked lists by summing reciprocal ranks (1/rank+k) for each document
BM25 (Keyword Search)	Traditional retrieval method matching exact terms between query and document
Vector Search	Semantic retrieval using embeddings to find conceptually similar content regardless of exact keyword overlap
k constant	Smoothing parameter (typically 60) that prevents any single retriever from dominating results
RAG (Retrieval-Augmented Generation)	Workflow that retrieves relevant documents before generating AI responses, reducing hallucinations

How ChatGPT uses RRF to select sources

ChatGPT's search functionality uses Retrieval-Augmented Generation (RAG), a workflow that retrieves relevant documents before generating an answer. RRF sits at the center of this process.

Metehan Yeşilyurt, a Growth Marketing Manager at AppSamurai, documented finding RRF parameters in ChatGPT's code while inspecting Chrome DevTools. Discovered Labs' recent Reddit research also found this behaviour.

The parameters confirm ChatGPT uses standard RRF settings:

rrf_alpha: 1
rrf_input_threshold: 0
ranking_model: null

The 5-step RAG workflow with RRF:

User query: Prospect asks "What's the best CRM for startups?"
Keyword search (BM25): System finds documents matching exact terms like "CRM," "startups," "best"
Semantic search (Vector): System finds documents conceptually related to startup CRM selection, even without exact matches
RRF fusion: Both ranked lists merge using the reciprocal rank formula
Generation: Top-ranked documents from the fused list become context for the answer

The OpenSearch blog explains: "RRF merges ranked results from multiple query sources... into a single relevance-optimized list."

If your page ranks well in BM25 but poorly in vector search, it gets dropped during fusion. ChatGPT never sees it. This explains the "invisible #1" paradox: you dominate Google's single-algorithm ranking but disappear in ChatGPT's multi-algorithm fusion.

For tracking this visibility gap, see our guide on how to track ChatGPT, Perplexity, and AI Overviews traffic in GA4.

RRF marketing implications: Why "Rank #1" is the wrong goal

Traditional SEO trained us to chase the #1 position through domain authority, backlink profiles, and keyword density. RRF changes this equation entirely.

According to analysis by Metehan Yeşilyurt, "Better to rank #4-8 for 30 queries than #1 for 3 queries."

His example shows a topic cluster beating a standalone page by almost 300% in RRF scoring:

Hub Page: "Complete Guide to Coffee Makers"
Cluster Pages across related queries: Rank #2, #5, #8
Total cluster RRF score: 0.0462
Standalone page at Rank #1: 0.0164

Traditional SEO vs. RRF optimization:

Dimension	Traditional SEO	RRF/AEO Era
Goal	Rank #1 for target keyword	Appear in top N across multiple retrieval methods
Primary signal	Backlinks and domain authority	Consensus across keyword + semantic search
Content structure	Comprehensive, keyword-optimized pages	Schema + semantic blocks (200-400 words)
Success metric	SERP position, traffic	Citation rate in AI outputs

RRF creates a fairness advantage by focusing on rank positions rather than raw authority scores. Domain authority alone won't save you, but it also won't stop you from competing. A smaller company that structures content to satisfy both retrieval methods can win citations against larger brands that only optimize for one.

The conversion payoff:

Ahrefs reported that AI search visitors convert at 23x higher rates than traditional organic search visitors. RankScience's analysis of 12 million visits found AI traffic converts at 4-5x higher rates on average. Microsoft Clarity's study of 1,200+ publisher sites showed referrals from Copilot converting at 17x the rate of direct traffic.

Discovered Labs ChatGPT research: What the data shows

Reddit citations in ChatGPT rose 87% since late July 2025, topping 10% of all citations according to Blyska's analysis. Wikipedia jumped 62%. Together with TechRadar, these three sites account for 22% of all citations.

Why Reddit performs well in RRF:

BM25 (keyword): Reddit discussions contain natural language with specific terminology ("best CRM for startups") in conversational context
Vector search (semantic): Conversational density creates rich semantic signals that vector embeddings capture

Josh Blyska explains: "Reddit and Wikipedia aren't winning because they're special. They're winning by default because they're the only ones providing direct answers."

This explains why we include Reddit marketing as a core service. Using dedicated aged, high-karma accounts, we help clients appear in discussions that RRF surfaces during retrieval. Axios reports Reddit appears in 5.5% of Google's AI Overviews responses.

For evaluating AI visibility platforms, see our comparison of Profound vs Peec vs Otterly.

How to get content cited by AI using RRF principles

The CITABLE framework translates RRF mechanics into actionable content structure. Read our full breakdown of the CITABLE framework for implementation details.

C - Clear entity and structure

Both BM25 and vector search need to immediately understand what your content is about. Lead with a 2-3 sentence BLUF (Bottom Line Up Front), use clear H2s that match query patterns, and define your entity in the first 100 words.

I - Intent architecture

Answering adjacent questions expands your semantic surface area. Identify the main question and 3-5 related questions, then structure content to address each. Our guide on FAQ optimization for AEO and GEO covers this in depth.

T - Third-party validation

This creates the "consensus signal" RRF rewards. Build presence on review platforms (G2, Capterra), participate in Reddit discussions, and seek mentions in industry publications. According to 6sense's 2025 Buyer Experience Report, 94% of B2B buyers now use LLMs in their buying process.

A - Answer grounding

Verifiable facts improve retrieval confidence. Cite statistics with sources and dates, include specific numbers, and link to primary sources.

B - Block-structured for RAG

RAG systems retrieve chunks, not full pages. Write self-contained 200-400 word sections under each H2/H3 that can answer questions independently.

L - Latest and consistent

Conflicting information lowers your RRF score. Add timestamps, update facts quarterly, and ensure information matches across all platforms.

E - Entity graph and schema

Schema markup explicitly tells retrieval systems what your content represents. Use Organization, Product, and FAQ schemas. Our entity SEO guide covers implementation details.

Building a strategy for the RRF era

You cannot hack RRF with a single tactic. The algorithm rewards genuine relevance across multiple signals.

Three strategic shifts:

Coverage over dominance: Topic clusters beat standalone pages in RRF scoring by almost 300%. Aim to rank well for many related queries rather than #1 for a single query.
Structure for both retrieval types: Optimize for keywords (BM25) AND semantic meaning (vector). If you only do one, RRF drops you during fusion.
Build consensus signals: Third-party mentions on Reddit, G2, and industry publications create the multi-source validation RRF favors.

G2's survey of 1,000+ B2B software buyers in August 2025 showed 87% say AI chatbots are changing how they research. Companies that understand RRF and build content strategies around it will capture demand their competitors miss entirely.

Stop guessing where you stand in the RRF sorting process. An AI Visibility Audit reveals exactly which queries cite you, which cite competitors, and where the gaps are.

Book a call with Discovered Labs. We'll show you how your content performs in AI retrieval and be transparent about whether we're the right fit to help fix it.

Frequently asked questions

What is the k constant in Reciprocal Rank Fusion?

The k constant (typically 60) is a smoothing parameter that prevents any single retrieval method from dominating results. Milvus documentation notes that k=60 is the common choice that balances influence across input lists.

Can ChatGPT give real citations with links?

Yes, when browsing is enabled, ChatGPT searches the web in real time and links to current sources. Without browsing, it may generate plausible-looking citations that don't exist.

How does RRF differ from PageRank?

PageRank analyzes link graphs to determine authority based on who links to whom. RRF operates on rank positions from multiple retrieval methods without considering links at all.

Does Perplexity use RRF too?

Perplexity uses hybrid search combining keyword and semantic retrieval, though the exact fusion method isn't publicly confirmed. G2 analysis shows Perplexity offers "real-time web indexing coupled with direct source citations."

Key terminology glossary

Reciprocal Rank Fusion (RRF): An algorithm that combines document rankings from multiple retrieval systems by summing the reciprocal of each document's rank. Created by Cormack, Clarke, and Buettcher in 2009.

Generative Engine Optimization (GEO): The practice of optimizing content to be cited by AI systems like ChatGPT, Claude, and Perplexity. Also called Answer Engine Optimization (AEO).

Retrieval-Augmented Generation (RAG): A technique that retrieves relevant documents before generating AI responses, reducing hallucinations by grounding answers in actual sources.

Hybrid search: Combining keyword-based (BM25) and semantic (vector) retrieval to capture both exact matches and conceptually related content.

Share of voice: The percentage of AI-generated responses that cite your brand versus competitors for a set of target queries.