How Gemini works: an inside look at AI agent architecture

TLDR: Gemini uses a multi-layer RPC system with 17+ distinct methods to process queries. Information flows through a feature flag system (138 flags), model tier selection, and streaming response generation. Understanding these mechanisms helps you optimize your content for AI visibility because Gemini treats structured data, entity relationships, and third-party validation as core signals when deciding what to cite.

We analyzed 1k+ traffic flows from Gemini's web interface and found something surprising: every query triggers 227 RPC calls across 28 methods, filtered through 138 feature flags, before a single word appears on screen. This complexity matters because it reveals exactly how Gemini decides what to cite and why. The system loads user preferences, model tier configurations, and extension permissions before processing any query, meaning your content's citability is determined by infrastructure decisions that happen in milliseconds.

What this analysis reveals about AI citation systems

Our traffic inspection captured the complete request-response cycle of Gemini's free tier. The findings expose how AI agents make real-time decisions about which sources to retrieve, how they process queries through different reasoning modes, and what technical infrastructure powers citation generation.

This matters for your AEO strategy. When you understand the retrieval mechanisms, you can structure your content to match how Gemini's systems actually work rather than guessing based on output patterns. For a broader view of how different AI systems select sources, see our analysis of AI citation patterns across ChatGPT, Claude, and Perplexity.

The BatchExecute RPC system: how Gemini processes queries

Every interaction with Gemini flows through Google's BatchExecute RPC protocol. Our analysis identified 227 RPC calls across 28 unique methods in a single session.

The primary endpoint is /_/BardChatUi/data/batchexecute. Each request includes:

User query text with conversation context
Selected mode and tier (Fast, Thinking, or Pro)
Session identifiers for state management
CSRF token (SNlM0e) for security

The most frequently called RPC methods tell you what Gemini prioritizes:

RPC Method	Call Count	Function
kwDCne	40	State synchronization
L5adhe	39	Primary chat handling
ESY5D	33	Feature flag retrieval (138 flags)
otAQ7b	4	Model tier configuration
MaZiqc	8	Model switching logic
cYRIkd	4	Extensions discovery

The key insight is that Gemini loads user preferences and feature configurations before processing any query. This pre-loading determines which capabilities are available for your response, including whether the system can access external information sources.

Here is how a typical query flows through the BatchExecute system:

+------------------+     +-------------------+     +------------------+
|   User Query     | --> |  BatchExecute     | --> |  RPC Dispatch    |
|  (gemini.google) |     |  /_/BardChatUi/   |     |  (28 methods)    |
+------------------+     +-------------------+     +------------------+
                                                           |
         +-------------------------+-----------------------+
         |                         |                       |
         v                         v                       v
+------------------+    +------------------+    +------------------+
|    ESY5D         |    |    L5adhe        |    |    MaZiqc        |
| Feature Flags    |    | Chat Handler     |    | Model Switching  |
| (138 configs)    |    | (main response)  |    | (tier routing)   |
+------------------+    +------------------+    +------------------+
         |                         |                       |
         +-------------------------+-----------------------+
                                   |
                                   v
                        +------------------+
                        | StreamGenerate   |
                        | SSE Response     |
                        +------------------+

This architecture contains foundational components that Google has documented about Gemini's API design, which emphasizes streaming responses and configurable model parameters.

How Gemini decides what model to use

Gemini operates three distinct model tiers, each with different reasoning capabilities:

Tier 1: Fast (gemini-2.5-flash)

Default for free users
Optimized for quick responses
Response times: 4 to 35 seconds observed

Tier 5: Thinking

Enhanced reasoning for complex problems
Limited access on free tier

Premium: Pro (gemini-advanced)

Extended reasoning time
Full extension access
Requires paid subscription

The critical flag enforce_default_to_fast_version=true forces free tier users to the cheapest model regardless of UI selection. This affects citation quality because faster models may retrieve fewer sources or perform less thorough grounding.

For AEO practitioners, this means your content needs to be citable even when Gemini uses its most resource-constrained model. Clear entity definitions, structured data, and direct answers become more important when the AI has less processing capacity for complex retrieval. Understanding how LLM retrieval works for AI search can help you design content that performs well across all model tiers.

The model selection process works like this:

                    +------------------+
                    |  Incoming Query  |
                    +--------+---------+
                             |
                             v
                    +------------------+
                    | Check User Tier  |
                    | (Free/Premium)   |
                    +--------+---------+
                             |
          +------------------+------------------+
          |                                     |
          v                                     v
+-------------------+                 +-------------------+
|    FREE TIER      |                 |  PREMIUM TIER     |
| enforce_default_  |                 | Full model access |
| to_fast=true      |                 +-------------------+
+-------------------+                           |
          |                    +----------------+----------------+
          v                    |                |                |
+-------------------+          v                v                v
| gemini-2.5-flash  |   +----------+    +-----------+    +------------+
| (Tier 1: Fast)    |   | Thinking |    | Pro Model |    | Deep       |
| 4-35 sec response |   | (Tier 5) |    | (Premium) |    | Research   |
+-------------------+   +----------+    +-----------+    +------------+

Google's Gemini model documentation confirms these different capability levels, noting that Flash models prioritize speed while Pro models offer extended reasoning.

The 138 feature flags that control Gemini's behavior

The ESY5D RPC loads 138 feature flags that determine every aspect of Gemini's functionality. These flags fall into six categories:

UI Suppression (43 flags): Control which dialogs, tooltips, and banners appear

Discovery and Tooltips (41 flags): Track feature exposure and user awareness

Impression Tracking (37 flags): Count views and interactions

Dismissal Tracking (25 flags): Record dismissed UI elements

Timestamps (15 flags): Log when actions occurred

Feature Toggles (3 flags): Enable or disable core functionality

The flags most relevant to citation and retrieval include:

Flag	Function
enable_personal_context	Master toggle for personalization
enable_personal_context_gemini_using_workspace	Access to Google Workspace data
enable_token_streaming	Token-by-token response streaming
has_accepted_agent_mode_fre_disclaimer	Agent mode availability
last_selected_mode_id_on_web	Mode preference memory

The personalization flags are particularly important. When enabled, Gemini can access user data from Google Photos, Workspace, Search history, and YouTube to inform responses. This means your brand mentions across Google properties (YouTube videos, Drive documents, Gmail threads) can influence whether you get cited. Building a strong entity presence that LLMs recognize becomes critical for maximizing these personalized citation opportunities.

Mode-based routing: how queries reach different systems

Gemini routes queries through distinct processing modes based on complexity and user selection:

Default Chat (gemini-2.5-flash)

Standard query processing
Web grounding for current information
Citation generation from retrieved sources

Agent Mode

Autonomous task execution
Multi-step reasoning
Tool use capabilities

Deep Research

Extended information gathering
File upload processing
Multiple source synthesis

Deep Think

Enhanced reasoning chains
Longer processing time
Complex problem solving

Canvas

Interactive document editing
Collaborative generation

Gempix

Image generation routing
Imagen 4.0 model access

The mode determines which retrieval systems activate. Deep Research mode, for example, triggers more extensive web searches and source aggregation. Standard chat uses quicker retrieval with fewer sources.

For citation rate optimization, understand that users asking research-focused questions in Deep Research mode receive responses with more citations than quick queries in default mode. Creating content that answers research-intent queries gives you more citation opportunities. This aligns with what we know about how ChatGPT uses reciprocal rank fusion for AI citations, where research-intent queries trigger more extensive source aggregation across platforms.

Real-time updates: the Signaler system

Gemini maintains persistent connections through Google's Signaler service at signaler-pa.clients6.google.com. Our capture showed 161 long-polling connections with 20-second average durations.

This architecture enables:

Push notifications during response generation
Real-time UI updates without page refresh
Session state synchronization across devices

The Signaler pattern reveals that Gemini maintains live connections to update responses even after initial generation. If new information becomes available or the model refines its answer, updates can push to the client.

This real-time capability suggests that keeping your content fresh and frequently updated matters. Information that changes after initial retrieval could be incorporated through these push channels in future response refinements.

Authentication and session management

Gemini uses a multi-layer authentication system with 12+ cookies managing session state:

Primary Session Cookies:

SID, HSID, SSID (Session identification)
APISID, SAPISID (API session)
__Secure-1PSID, __Secure-3PSID (Secure session)

Authentication Header:

Authorization: SAPISIDHASH {timestamp}_{hash}

The SAPISIDHASH is computed from the current timestamp, SAPISID cookie value, and origin URL. This ensures each request is freshly authenticated.

For AEO strategy, this authentication layer means Gemini can associate query patterns with user accounts over time. Users who frequently search for topics related to your industry will have session data that influences how Gemini retrieves and ranks information for them. Building consistent presence across queries in your category compounds over time.

Experiment framework: 99+ A/B tests running simultaneously

Gemini runs 99+ active experiment IDs during each session, enabling server-side A/B testing across the entire system. Sample IDs include: 4927078, 5285474, 62257757, 21023470.

These experiments control:

Response generation algorithms
Citation selection criteria
UI presentation patterns
Model routing decisions

The implication is that citation behavior varies based on experimental cohort. Two users asking identical queries may receive responses with different sources cited based on their experiment assignments.

This variability is why measuring your AI visibility requires statistical significance testing across many queries rather than spot-checking individual responses. The experiment framework introduces controlled randomness that single-query analysis cannot account for.

Ad infrastructure: dormant but positioned

Our analysis found a complete ad serving infrastructure loaded but inactive:

Ad SDK Modules Loaded:

qads (Query Ads)
ada (Ad adapter)
adrc (Ad remote config)
qapid (Query API)
adcgm3 (Ad configuration)

Analytics Tags:

GTM Container: GTM-KKRLL9S
GA4 Property: G-WC57KJ50ZZ

The ad infrastructure returns null responses currently, but the SDK is fully loaded and operational. Google has positioned Gemini for future ad monetization without requiring client-side changes.

For brands, this signals that paid placement within AI responses is likely coming. Building organic citation presence now creates a baseline before paid competition enters the market. To understand the broader context of AI-powered search evolution, Google's research blog has published insights on grounding large language models for improved accuracy.

What this means for marketing leaders

The technical architecture behind Gemini reveals practical implications for how you allocate resources and prioritize your AI visibility efforts.

Optimize for the lowest common denominator. The enforce_default_to_fast_version=true flag means most users get Gemini's cheapest model. Your content must be citable even when the AI has minimal processing capacity, so prioritize direct answers and explicit entity definitions over nuanced, context-dependent explanations.
Treat Google properties as citation surfaces. The 138 feature flags show that personalization pulls from YouTube, Drive, and Search history. Building consistent brand presence across these platforms increases your chance of appearing in premium user responses where personalized retrieval is active.
Structure content for streaming retrieval. The BatchExecute system processes queries through 28 RPC methods and delivers responses in fragments. Block-structured content (200-400 word sections with clear headings) aligns with how Gemini chunks information during generation.
Account for experimental variation in measurement. With 99 A/B tests running simultaneously, single-query spot checks are unreliable. Budget for statistically significant testing across 50-100 queries per keyword cluster to get accurate visibility data.
Prepare for paid placement. The dormant ad infrastructure (qads, ada, adrc modules) signals that sponsored results in AI answers are coming. Establishing organic citation presence now creates a baseline before paid competition enters the market.
Prioritize page speed as a retrieval factor. With P95 request timing at 3.2 seconds, slow-loading pages risk timeout during retrieval. Fast technical performance is no longer just a UX concern. It directly affects whether Gemini can access your content.

What this means for your AEO strategy

Based on this analysis, optimize for Gemini citations by focusing on these technical factors:

1. Structure content for passage retrieval

Gemini's streaming architecture processes content in fragments. Use 200-400 word sections with clear headings. Each section should answer a single question completely so it can be extracted independently.

2. Define entities explicitly

The feature flag system shows Gemini loads extensive context about entities before responding. Include clear entity definitions (company name, product category, key differentiators) near the top of your content.

3. Build consistent presence across Google properties

Personalization flags show Gemini can access YouTube, Drive, and Search history. Ensure your brand appears consistently across these surfaces for premium user citation opportunities.

4. Optimize for the cheapest model tier

Free users are forced to gemini-2.5-flash. Make your content citable even for resource-constrained models by using direct answers, structured data, and explicit relationships.

5. Publish content that answers research intent

Deep Research mode triggers more extensive retrieval with more citations. Target queries that users would investigate thoroughly rather than quick factual lookups. For a complete framework on this approach, see our guide on what is GEO (Generative Engine Optimization).

6. Ensure fast page load times

With P95 request timing at 3.2 seconds, slow-loading pages risk timeout during retrieval. Keep your content fast to ensure Gemini can access it reliably.

How Discovered Labs can help

Understanding Gemini's internal architecture is one thing. Acting on it systematically is another.

Discovered Labs specializes in AEO for B2B SaaS companies who want to capitalize on this distribution shift toward AI answers. We use internal technology to measure how you appear across AI platforms, then execute content strategies designed specifically for passage retrieval and citation optimization.

Our approach includes AI visibility audits that show exactly where you're missing from AI answers, daily content production using our CITABLE framework, and third-party mention campaigns that build the validation signals AI systems trust.

FAQs

How does Gemini decide which sources to cite?

Gemini uses a combination of model tier selection, mode-based routing, and feature flag configurations to determine retrieval depth. Deep Research mode retrieves more sources than standard chat. The system prioritizes content with clear entity definitions, structured data, and third-party validation signals.

Does Gemini's free tier cite differently than premium?

Yes. Free tier users are forced to the Fast model (gemini-2.5-flash) and have all 8 Google Workspace extensions disabled. This means free users receive citations primarily from public web content, while premium users can receive citations from personal documents and deeper web research.

How many queries should I test to measure my AI visibility?

Given that 99 experiment IDs are active simultaneously, you need statistical significance across multiple queries and sessions. Single-query testing cannot account for the experimental variation in Gemini's responses. Test at minimum 50-100 queries across your target keyword clusters.

Does page speed affect AI citations?

Based on the P95 request timing of 3.2 seconds in our analysis, sources that respond slowly risk being dropped from retrieval. Fast-loading pages have a higher chance of being included in Gemini's response generation.

How often does Gemini update its knowledge?

The Signaler system enables real-time updates during response generation, and the server build version shows regular deployments (our capture showed build date 2026-01-14). However, retrieval freshness depends on web indexing. Frequently updated content with recent timestamps signals currency to the retrieval system.