How Google AI Mode works

TL;DR

Google AI Mode activates via udm=50 URL parameter and takes an average of 6.5 seconds to generate responses
The system uses five core services including AimThreadsService and a streaming endpoint called /async/folif
Citations are delivered as HTML fragments with tracking tokens (data-ved) that correlate to source content
Google runs 816+ active experiments on AI Mode, meaning what works today may shift tomorrow
Content that appears in structured, retrievable formats has better chances of citation

Google AI Mode runs 816 active experiments simultaneously, routes queries through five distinct backend services, and takes 6.5 seconds on average to generate a response. We know this because we captured and analyzed the actual network traffic flowing between browsers and Google's servers during AI Mode sessions. What we found reveals exactly how content gets retrieved, processed, and cited in those AI-generated summaries.

Understanding this system helps you optimize your content for retrieval and citation. While Google has announced AI Mode as part of Search Labs, the internal architecture we document here comes from direct traffic analysis.

What triggers Google AI Mode?

Google AI Mode is not simply a UI layer on top of traditional search. It is a completely different rendering pipeline.

When you or your prospects search with AI Mode enabled, Google checks for a URL parameter called udm=50 (Universal Display Mode). This single parameter routes the entire request to the AI Mode backend instead of traditional search.

Here is what happens at the technical level:

udm=50 routes to AI Mode (we observed 36 occurrences in our traffic capture)
udm=2 routes to traditional web search
udm=7 goes to image search
udm=28 goes to shopping

The server also checks for google.sn = 'aim' which confirms AI Mode is active. If these conditions are met, your query enters an entirely different processing flow than the "10 blue links" you are used to.

This matters for AI visibility because the retrieval and ranking mechanisms are fundamentally different. Traditional SEO optimizes for page rankings. AEO optimizes for passage retrieval within an AI generation pipeline. For a deeper dive into how this retrieval process works across AI systems, see our guide on how LLM retrieval works for AI search.

The 5 services that power AI Mode responses

Once a query enters AI Mode, it passes through five core services before you see a response.

1. AimThreadsService This service manages conversation threads. It handles listing, creating, deleting, and sharing AI conversations. Every AI Mode session creates a thread that Google tracks.

2. ValidationAsyncService Before processing your query, Google validates the request. This includes checking user eligibility, geographic restrictions, and experiment flags.

3. Botguard protection Google runs anti-bot validation on every AI Mode request. Our traffic capture showed triggeredBotProtection: true with trigger groups that categorize users. If Google suspects bot activity, the request may be blocked or degraded.

4. /async/folif This is the main query handler that generates AI responses. It is where the actual AI generation happens and where citations get selected and attached.

5. /async/bgasy and /async/hpba These handle background operations and homepage async tasks. They support the main generation pipeline without directly producing the AI answer.

The implication for content creators is clear. Your content must pass through validation, survive bot protection, and then compete for citation in the folif generation step. Each layer is an opportunity for your content to be included or excluded.

Here is how these five services work together:

┌─────────────────────────────────────────────────────────────────────┐
│                     Google AI Mode Request Flow                     │
└─────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│  1. AimThreadsService                                               │
│     • Creates/manages conversation thread                           │
│     • Assigns thread ID for session tracking                        │
└─────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│  2. ValidationAsyncService                                          │
│     • Checks user eligibility and geo restrictions                  │
│     • Validates experiment flags                                    │
└─────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│  3. Botguard Protection                                             │
│     • Anti-bot validation (triggeredBotProtection: true/false)      │
│     • Request blocked or degraded if suspicious                     │
└─────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│  4. /async/folif  (Main Generation)                                 │
│     • AI response generation                                        │
│     • Citation selection and attachment                             │
│     • Streams HTML fragments to client                              │
└─────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│  5. /async/bgasy & /async/hpba  (Background Ops)                    │
│     • Support tasks for main pipeline                               │
│     • Telemetry and logging                                         │
└─────────────────────────────────────────────────────────────────────┘

How the streaming protocol delivers citations

Google AI Mode does not return a single response. It streams content progressively over approximately 6.5 seconds using a protocol we call the folif streaming flow.

Here is the step-by-step process we observed:

Step 1: Initial search request (0ms) The browser sends a request to /search?q=your+query&udm=50. Google returns an HTML shell with placeholder containers and loading spinners.

Step 2: Streaming request begins (500ms) JavaScript initiates a call to /async/folif with parameters including an Event ID that correlates to the placeholder container.

Step 3: Progressive response (500ms to 6500ms) The server streams HTML fragments using chunked transfer encoding with Brotli compression. Each chunk contains:

AI-generated text
Source citations with tracking tokens (data-ved)
Related questions
Image and video embeds

Step 4: Client-side injection JavaScript parses each chunk, finds the target container via data-container-id, and injects the content into the DOM with animations.

Step 5: Telemetry (6200ms) The browser reports timing metrics back to Google including aimf (AI Mode Finish), aimfc (AI Mode Finish Complete), and aimr (AI Mode Render).

Our traffic analysis showed average response sizes of 454KB delivered over 6.6 seconds. The slowest generation took 8 seconds, the fastest was 6 seconds.

The streaming response flow looks like this:

Timeline (milliseconds)
│
0ms     ┌──────────────────────────────────────────────────────────┐
        │  Browser: GET /search?q=query&udm=50                     │
        │  Server: Returns HTML shell with placeholders            │
        └──────────────────────────────────────────────────────────┘
                                    │
500ms   ┌──────────────────────────────────────────────────────────┐
        │  JavaScript: POST /async/folif                           │
        │  Includes: Event ID, session tokens, query context       │
        └──────────────────────────────────────────────────────────┘
                                    │
        ┌──────────────────────────────────────────────────────────┐
        │              STREAMING RESPONSE                          │
        │  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐         │
500ms-  │  │Chunk 1 │→→│Chunk 2 │→→│Chunk 3 │→→│Chunk N │         │
6500ms  │  │AI text │  │Cite.s  │  │Images  │  │Complete│         │
        │  │partial │  │data-ved│  │embeds  │  │answer  │         │
        │  └────────┘  └────────┘  └────────┘  └────────┘         │
        │         (Brotli compressed, chunked transfer)            │
        └──────────────────────────────────────────────────────────┘
                                    │
6200ms  ┌──────────────────────────────────────────────────────────┐
        │  Telemetry: aimf (finish), aimfc (complete), aimr(render)│
        │  Reports timing metrics back to Google                   │
        └──────────────────────────────────────────────────────────┘

This streaming architecture is similar to how other AI systems like ChatGPT and Perplexity deliver responses, though the citation mechanisms differ. For a comparison of how different AI systems select sources, see our analysis of AI citation patterns across ChatGPT, Claude, and Perplexity.

What the citation mechanism looks like in practice

Citations in Google AI Mode are not simply links. They are structured HTML fragments with multiple tracking attributes.

Here is what a citation looks like in the actual response:

<div class="citations">
  <a href="..." data-ved="2ahUKEw...">TechRadar</a>
  <a href="..." data-ved="2ahUKEw...">CNET</a>
</div>

The key attributes that matter:

data-ved: View Event Data tracking token that Google uses to measure clicks and engagement
data-msei: Model State Embedding ID that correlates the citation to a specific inference state
data-container-id: Links the citation to its parent container in the streaming response

What does this mean for your content? Google is tracking which citations get clicked, how they correlate to user satisfaction, and how the AI's confidence level relates to source authority. This feedback loop likely influences future citation decisions. This is consistent with Google's published research on attribution in language models, which emphasizes verifiable source attribution as a key quality metric.

Project MARS: the next-gen search infrastructure

Our traffic capture revealed a codename for Google's AI Mode architecture: Project MARS.

MARS activates via a hotswaps URL parameter containing search_next_mars or search_next_mars_lro. The LRO variant stands for Long Running Operation, which allows extended timeouts for complex reasoning queries.

When MARS is active:

Static HTML responses become progressive streaming HTML
Single request/response becomes multiple async fragments
500ms total load becomes 6000ms+ streaming generation
Traditional search results become AI-generated summaries with embedded citations

The hotswap system suggests Google can dynamically enable or disable MARS features for specific users or queries. This explains why AI Mode behavior can vary between searches and why A/B testing is so prevalent. Google has discussed this experimentation-driven approach in their Search Central documentation, though the internal codenames like MARS are not publicly documented.

The experiment infrastructure behind AI Mode

Google runs massive A/B testing on AI Mode. Our capture identified:

816 unique experiment IDs (kEXPI) active in a single session
10+ unique rollout tokens controlling feature variations
Operation ID 89978449 as a primary experiment allocation identifier

This scale of experimentation means the AI Mode you experience today may behave differently tomorrow. Features get tested, rolled back, and modified constantly.

For AEO strategy, this has an important implication. You cannot optimize for a static system. You need to optimize for the principles that likely remain constant across experiments: structured content, authoritative sources, consistent entity information, and clear answers to specific questions. This is why entity SEO for AI has become essential - consistent entity signals help AI systems recognize and trust your brand regardless of which experiment variant is running.

How this impacts your AI visibility strategy

Understanding Google AI Mode's internal mechanisms changes how you should think about optimization.

Optimize for passage retrieval, not page ranking. AI Mode extracts passages from content rather than ranking full pages. Your content needs clear, self-contained answer blocks that can be retrieved independently.

Structure content for streaming injection. Google delivers citations as HTML fragments. Content that is already structured with clear headings, concise paragraphs, and explicit entity relationships is easier for the system to parse and cite.

Build citation consistency across sources. The validation layer and Botguard checks suggest Google weighs source credibility. Third-party mentions on authoritative sites, consistent entity information across the web, and established domain authority all likely influence citation probability. For a practical implementation guide, see our step-by-step guide on how to get your content cited by AI.

Prepare for constant change. With 816+ experiments running, what works in January may not work in March. Track your share of voice in AI answers regularly and adjust your strategy based on observed changes.

Monitor streaming response patterns. The 6.5-second average generation time and 454KB response size suggest complex inference. Simpler queries may cite fewer sources. Complex queries may pull from more diverse content. Match your content depth to query complexity.

What this means for marketing leaders

The technical realities above translate into concrete actions you can take now.

Structure content for passage retrieval. AI Mode extracts standalone passages, not full pages. Break your content into 200-400 word blocks with clear headings so the folif generation step can pull discrete, citable answers.
Front-load your best answers. The 6.5-second streaming window means citations are selected early in the generation process. Place your most authoritative statements in the first few paragraphs of each section where they are more likely to be retrieved and cited.
Expect constant variation. With 816+ active experiments running simultaneously, the AI Mode you see today will behave differently next month. Build a monitoring cadence (weekly or bi-weekly) to track your share of voice and adjust tactics based on observed changes.
Ensure entity consistency across all sources. The five-service architecture includes validation layers that likely cross-reference your brand information. Conflicting details on your website, LinkedIn, Wikipedia, or third-party directories create friction that reduces citation probability.
Optimize for bot protection signals. Botguard checks every request. Sites with suspicious patterns or thin content may face degraded treatment. Focus on building genuine authority through third-party mentions and consistent publishing rather than shortcuts.
Match content depth to query complexity. Our traffic analysis showed 454KB average response sizes, suggesting AI Mode handles complex inference well. Create both quick-answer content for simple queries and comprehensive guides for multi-faceted questions.

How Discovered Labs helps with AI Mode optimization

At Discovered Labs, we use internal tools to track how brands appear in AI answers across Google AI Mode, ChatGPT, and other answer engines. Our approach includes:

AI visibility audits that show exactly where your brand is cited (or missing) in AI responses
Content production using our CITABLE framework designed for passage retrieval and citation
Third-party mention campaigns that build the authority signals AI systems look for
Technical optimization including structured data and entity consistency

The mechanisms we have documented here inform how we build content that gets retrieved and cited. If you want to understand where you stand in AI Mode specifically, book a call and we will show you the data.

FAQs

How long does Google AI Mode take to generate a response? Based on our traffic analysis, AI Mode takes an average of 6.5 seconds to generate a complete response. The fastest we observed was 6 seconds, the slowest was 8 seconds. This is significantly longer than traditional search, which typically returns results in under 500ms.

Does Google AI Mode use different ranking factors than traditional search? Yes. AI Mode uses passage retrieval rather than page ranking. It extracts specific content blocks from sources rather than ranking entire pages. This means content structure, answer clarity, and entity relationships matter more than traditional SEO signals like keyword density. Google's AI Overviews use a similar approach - learn more in our guide on how to get cited in Google AI Overviews.

How many sources does Google AI Mode typically cite? This varies by query complexity. Our observations showed citation blocks containing multiple sources per response, typically 3-6 primary citations with additional related links. Complex queries tend to cite more diverse sources.

Can I see if Google AI Mode is citing my content? Not directly from Google. You need to manually query AI Mode or use monitoring tools that track AI visibility. Discovered Labs provides this tracking as part of our AEO services.

What is the udm=50 parameter? UDM stands for Universal Display Mode. The value 50 specifically triggers AI Mode. Other values route to images (7), videos (14), shopping (28), or traditional web search (2). You can manually add &udm=50 to a Google search URL to force AI Mode if you have access.

How Google AI Mode works

What triggers Google AI Mode?

The 5 services that power AI Mode responses

How the streaming protocol delivers citations

What the citation mechanism looks like in practice

Project MARS: the next-gen search infrastructure

The experiment infrastructure behind AI Mode

How this impacts your AI visibility strategy

What this means for marketing leaders

How Discovered Labs helps with AI Mode optimization

FAQs

Continue Reading

How Google AI Overviews works

How Google AI Mode ads work today (and what they might look like tomorrow)

How Google AI Overviews ads work today and where they're heading

How Gemini works: an inside look at AI agent architecture