TL;DR
- Many agencies added "Answer Engine Optimization (AEO)" to their website copy recently without changing their underlying retrieval methodology. The difference matters for pipeline, not just impressions.
- Evaluate agencies across three surfaces: web search, AI citations, and training data. An agency optimizing only for Google rankings misses the buyer research happening inside ChatGPT, Claude, and Perplexity.
- Ask for attribution paths in case studies, not traffic or ranking charts. Named case studies with MQL-to-pipeline numbers are the baseline proof point.
- Month-to-month retainers protect you. AI search is shifting fast enough that 12-month lock-ins carry real risk.
- Score every vendor against the same criteria before your first call. This article gives you a structured scorecard and discovery question list to do that.
B2B buyers now research with AI assistants before visiting a vendor's website. If your brand doesn't appear in those answers, you miss pipeline the sales team never sees. This guide covers how to evaluate SaaS SEO agencies, what questions to ask in discovery, and how to identify partners with the technical depth to drive measurable AI-referred revenue.
Why most SaaS companies hire the wrong SEO agency
We see SaaS CMOs hire the wrong SEO agency using the same process every time. Most shortlist based on domain authority, blog portfolio, and case study volume. These are reasonable proxies for traditional search performance, but they don't tell you whether the agency understands how AI retrieval systems actually select content. That gap is where expensive mistakes happen.
Traditional SEO optimizes for a ranked list of documents. LLMs retrieve semantically relevant passages and synthesize a single answer. Those are fundamentally different systems. Our SEO vs. AEO breakdown shows roughly 80% of the activity categories overlap, but the remaining 5-20% diverges enough to change what you should build and measure. Hiring an agency that ignores that divergence means paying for deliverables that don't move AI share of voice.
The attribution blind spot
I'll tell you the hardest part of the AI search conversation: attribution. It's the most important question to ask any agency you're evaluating. GA4, HubSpot, CRM, and self-reported data often give different answers for the same conversion. AI-referred sessions may appear as direct or organic in your analytics stack, making it difficult to isolate their contribution from other channels.
A credible agency addresses this head-on. A strong partner will set up UTM-tagged landing pages for AI-referred traffic, add a "how did you hear about us" field to demo and contact forms, and build a monthly narrative report that acknowledges what's tracked versus what's estimated. Any agency claiming their dashboard solves attribution completely is either naive or selling you something. The honest answer is that AI attribution is probabilistic, and a good partner tells you that upfront rather than showing you a dashboard that looks cleaner than reality.
Rebadged SEO vs. AI-native methodology
Recently, dozens of SEO agencies added AEO and GEO to their service pages. Most changed the vocabulary without changing the retrieval model. You can spot the difference with two questions: "What is your approach to passage extractability?" and "How do you measure citation rate separately from organic rankings?"
When you ask an agency with genuine AI retrieval expertise about passage extractability, they'll explain that dense retrieval systems outperformed keyword-based search by 9-19 points on top-20 passage retrieval (Karpukhin et al., 2020), and that content structure matters more than keyword density for LLM citation selection. If the answer defaults to "we write quality content and build links," you're looking at a rebadged SEO shop.
What changed in 2025-2026
We tracked a clear shift in how AI systems relate to traditional search rankings. Internal tracking data suggests that the overlap between traditional organic rankings and AI citation selection is shrinking, with a measurable divergence appearing in late 2025 and early 2026. The gap is widening faster than most agencies are prepared to admit. For more context on the trajectory, watch this 2026 SEO shift explainer from our YouTube channel.
Core evaluation criteria for SaaS SEO agencies
I recommend you score every agency against these six criteria. Weight them based on your current situation, but treat methodology depth and proven pipeline impact as non-negotiables. For the full vendor-evaluation framework this scorecard fits inside, see our guide on how to choose an SEO agency for B2B SaaS.
Proven pipeline impact, not just traffic
Traffic charts and keyword rankings are necessary but not sufficient. What you need to see is an attribution path from content to MQL to pipeline. The Sova Assessment case study is a useful benchmark: organic search became their top pipeline channel, becoming the #1 channel for leads and MQLs with a +167% increase in organic demo requests. That's a different and harder metric to hit.
When you review an agency's case studies, ask for the attribution model they used. Did the client track AI-referred MQLs separately? Was the pipeline increase concurrent with SEO investment, or did other channels shift at the same time? A credible agency will walk you through the methodology, not just the headline number. If they can't explain how the pipeline attribution was isolated, treat the case study as directional at best.
Methodology depth and transparency
We believe a defensible methodology needs a name, a framework, and published documentation you can review before signing. We built our CITABLE framework to structure content specifically for LLM passage retrieval.
That level of specificity matters because LLMs reward claims that appear consistently across sources, not the highest domain authority page. If an agency can't explain their content architecture in those terms, they're likely optimizing for a ranking algorithm, not a retrieval pipeline. You can score your existing content against CITABLE criteria using our free AEO content evaluator before any agency conversation.
Speed to initial signal
I recommend you get a realistic timeline before signing. You should see initial citations, where your content appears in LLM responses to priority buyer queries, within 1-2 weeks of publishing optimized content. Meaningful citation rate lift, where your share of voice on priority queries moves materially, takes 3-4 months of consistent execution.
The 3-4 month window for measurable lift reflects how frequently LLMs index new content and how long information consistency takes to build across independent sources. We can't speed that up without compromising quality. Any agency promising material results in 30 days without explaining the mechanism should back that claim with a named example. For a detailed walkthrough of realistic timelines, watch this B2B SaaS AI search guide.
Commercials and contract flexibility
I view public pricing as a trust signal. When an agency won't publish their pricing, they're either A/B testing what they can charge you, or their pricing varies enough by client that comparison is impossible. Neither helps you build a defensible line item for the CFO.
Package | Price | Commitment | Core deliverables |
|---|
AEO Sprint | €6,995 one-off | None | 10 optimized articles, AI visibility audit, answer modeling and entity map, schema and content structure for LLMs |
Starter | €6,995/mo | Month-to-month | Up to 20 SEO and AEO articles using CITABLE, visibility tracking, structured data, backlinks, Reddit engagement, dedicated team of 4 |
Growth | €10,995/mo | Month-to-month | Expanded article capacity, landing pages, syndication, quarterly reviews, senior team support |
Enterprise | Custom | Flexible | Programmatic content at scale, original research for category authority |
Month-to-month retainers are the right structure given how fast AI search is evolving. We see annual contracts before proof of concept as a vendor protection mechanism, not a client benefit. If an agency requires annual commitment before delivering a single result, that's a commercial incentive misalignment worth naming directly.
Measurement and reporting rigor
We build our monthly reports to tell a coherent story. They cover citation rate on priority buyer queries, share of voice against your top two or three competitors, AI-referred sessions segmented by engine, MQL-to-opportunity conversion for AI-sourced leads, and a plain-language narrative explaining what changed and why. What they don't do is hand you 40 slides of traffic charts and call it a strategic update.
Our AI visibility tracker maps where a client appears across Google AI Overviews, ChatGPT, Claude, Perplexity, and Gemini, tracked against competitors on the same queries. Ask any agency you're evaluating for a sample report and a data dictionary explaining how each metric is calculated.
Specialization vs. generalist positioning
I see meaningful differences in SaaS sales cycles, recurring revenue models, and buyer journey dynamics compared to e-commerce or B2C. An agency that primarily works with SaaS companies will have intuitions about which content categories convert for pipeline versus traffic, how to structure content for evaluation-stage queries, and how to map content to the consideration phase happening inside AI assistants.
Generalist agencies can do adequate SEO work for SaaS. What they typically lack is a query map built around SaaS buyer behavior and a measurement model designed for MQL-to-pipeline attribution. The SEO agency for B2B SaaS conversation is different from a broad digital marketing conversation, and you should verify that differentiation is real, not just a homepage claim.
What to ask in the discovery call
Use these questions to structure each discovery call. Asking the same questions across every vendor gives you a direct apples-to-apples scorecard before you compare proposals.
Questions about case studies and proof
- "Walk me through the attribution model in your most relevant case study. What was the baseline citation rate before you started, and where did it land after 90 days?"
- "How do you distinguish AI-referred pipeline from organic pipeline in your attribution stack?"
- "Have you worked with a company at our ARR stage and vertical? Can we speak with them?"
Questions about methodology and approach
- "What is your content structure framework for LLM passage retrieval, and how does it differ from traditional SEO content architecture?"
- "How do you approach information consistency across Reddit, industry publications, comparison content, and the client's own site?"
- "How do you handle the technical side, schema, entity disambiguation, site architecture, for AI crawlers versus Googlebot?"
Questions about attribution and measurement
- "How do you tag AI-referred traffic so it appears separately in GA4 and HubSpot? What form fields or CRM integrations do you add during onboarding?"
- "How do you handle the gap between tracked AI-referred sessions and zero-click research that happens entirely inside the LLM?"
- "What does your monthly report cover? Can you share a sample?"
Questions about team structure and depth
- "Who will work on our account day-to-day, and can I meet them before signing?"
- "Do you have AI or ML engineers on staff, or do you rely on third-party tools for retrieval analysis?"
- "What proprietary tools or research have you built versus licensed from vendors?"
Questions about timelines and expectations
- "What can we realistically expect in the first 30, 90, and 180 days?"
- "What does a first-week deliverable look like so we can validate your process early?"
Questions about pricing and terms
- "Is your pricing public, and does this proposal match what's published?"
- "What's the minimum commitment, and what's the exit process if we need to pause?"
- "What deliverables are included in the retainer versus billed additionally?"
Red flags to listen for during the pitch
Vague ROI language without pipeline tie-back
"Improved visibility" and "increased brand awareness" are not board-level metrics. If an agency can't connect their deliverables to a measurable pipeline number, a citation rate, or an MQL figure, you'll have no basis for a renewal conversation in six months. Our research identifies which content signals drive citation selection. An agency with genuine retrieval expertise will have a similarly specific view of what moves the numbers.
12-month contracts before proof of concept
We see annual contracts before proof of concept as a vendor protection mechanism, not a client benefit. AI search is evolving fast enough that locking in 12 months assumes the agency's approach will remain effective through platform changes from OpenAI, Google, and Anthropic that no one can fully predict. Month-to-month terms are the only structure that keeps accountability aligned with delivery.
Fear-based selling tactics
I find that agencies leading with fear typically have weak methodology to offer and use anxiety as a substitute. The honest framing is that AI search is a growth opportunity with a specific technical approach and a 3-6 month window to measurable results, not a crisis requiring immediate panic-spending.
Claiming attribution is solved
Attribution across AI surfaces is genuinely hard. GA4 doesn't distinguish AI-referred sessions reliably. CRM data captures what sales teams log, not what prospects experienced during research. When an agency claims they have attribution fully solved, they're overstating the current state of measurement tooling, and that's worth probing directly. Honest agencies acknowledge the limitation and show you what they can and can't measure.
Generic deliverables without customization
Content velocity models that promise 30 articles per month without a query map, buyer journey analysis, or citation structure framework often produce AI slop: high-volume content that ranks for nothing and gets cited by no one. Ask specifically what makes their content architecture different for AI retrieval versus traditional SEO. If the answer is generic, that's worth noting.
Proof points to request before signing
Named case studies with attribution paths
The incident.io case study is a useful benchmark: AI visibility moved from 38% to 64% on priority buyer queries, organic meetings booked increased by 22%, and one closed deal was directly attributed to a Claude citation. That level of specificity, named client, specific metric, attribution path, specific AI engine, is what a credible case study looks like.
Tom Wentworth, CMO at incident.io, described the starting point clearly:
"Before Discovered Labs, we were using homegrown LLM prompts, without a clear strategy for what to optimize for or exactly how best to structure content." - incident.io case study
And the outcome:
"...recommended you to multiple peer CMOs. There are large organizations like Hubspot and Ramp who have dedicated teams to work on large projects like AEO. For everyone else (except my competitors) there's Discovered Labs!" - incident.io case study
Competitive benchmark or visibility audit
You need to establish your baseline before you can measure progress. Request a competitive benchmark report that shows where you currently appear across ChatGPT, Claude, Perplexity, and Google AI Overviews on your priority buyer queries, alongside where your top two or three competitors appear. Without that baseline, any citation rate improvement claim is untethered. Our AI visibility tracker generates this audit for new clients as part of onboarding.
Sample reporting and dashboard access
Ask for a sample monthly report from a current client with details redacted. Look for citation rate by query cluster, share of voice trend, AI-referred sessions by engine, MQL attribution, and a plain-language narrative explaining what changed in the month. If the sample is a 40-slide deck with traffic graphs, ask what the strategic recommendation was for the following month based on those graphs.
Published research or methodology documentation
I see published original research as the clearest signal that an agency does work beyond delivering client retainers. Our Reddit and ChatGPT citation analysis found that Reddit appeared in 0.35% of visible ChatGPT citations but occupied roughly 27% of ChatGPT's internal search slots during query processing. That finding changed how we approach off-page strategy for every client, because a links-only view of off-page work misses most of what shapes AI answers. If an agency can point to published research that changed their methodology, that's a signal worth taking seriously.
References from similar-stage SaaS companies
Stage matching likely matters when evaluating references. An agency that primarily works with enterprise clients may have strong case studies but less direct experience with the pipeline velocity, team size, and budget constraints typical of earlier-stage marketing operations. Ask specifically for references at your ARR range and in your vertical before signing.
How to assess team depth without being sold to
Who actually does the work?
The bait-and-switch of senior pitch teams and junior execution teams is one of the most common failure modes in agency relationships. Ask directly: who will be on the weekly call, who writes the content briefs, who manages technical implementation, and who owns reporting. Get those names before signing, not after.
In-house vs. outsourced capabilities
Outsourced content in an AEO context carries specific risk. Writers not trained on passage extractability, block structure for RAG retrieval, or entity graph relationships will produce content that reads well but doesn't get cited. Ask what percentage of content production is in-house versus contracted, and how contractor training and quality review works.
Technical depth indicators
We see genuine technical depth in agencies that have proprietary tooling, original research, or team compositions including engineers alongside content and SEO specialists. Our AI and engineering work builds the visibility tracking infrastructure, the knowledge graph across client content, and the audit tooling that drives our recommendations. You can assess the technical approach by reviewing our analysis of what drives AI citations.
Account management vs. execution team
In some agency structures, account managers translate client requests to execution teams, which can sometimes result in limited context on your product, buyer, or competitive position. Ask how many accounts each execution specialist carries, and whether the person writing your content will be on the monthly call.
Comparing proposals apples-to-apples
Standardizing scope across vendors
Don't compare a "content velocity" model promising 30 articles per month directly to a CITABLE-structured content program delivering 20 articles per month with AI visibility tracking, off-page consistency work, and schema implementation. Reduce each proposal to the same units: structured content pieces, citation tracking coverage, off-page deliverables, and technical optimizations per month. Compare at that level.
Normalizing pricing to monthly cost per deliverable
The Starter tier at Discovered Labs is €6,995 per month for up to 20 structured articles, visibility tracking, competitor monitoring, structured data, backlinks, and Reddit engagement. When comparing proposals, normalize each to cost per measurable output rather than headline retainer price. Full pricing details are at discoveredlabs.com/pricing.
Evaluation scorecard template
I recommend you use this table to score each agency before and after the discovery call. A score of 1-5 per criterion, weighted by the percentages below, gives you a normalized comparison across vendors.
Criterion | Weight | What to look for |
|---|
Pipeline impact | 20% | Attribution path from content to MQL to pipeline, named case study with specific numbers and attribution methodology |
Methodology | 25% | Published retrieval framework like CITABLE, three-surface model covering web search, AI citations, and training data |
Speed to initial signal | 10% | Initial citations within 1-2 weeks of publishing, measurable citation rate lift within 3-4 months of consistent execution |
Measurement and reporting | 20% | Citation rate by query cluster, share of voice trend, AI-referred sessions by engine, MQL attribution, transparent acknowledgment of measurement limitations |
Commercials | 15% | Month-to-month terms rather than 12-month lock-ins, public pricing, clear deliverables per tier |
Specialization | 10% | SaaS-specific query map, MQL-to-pipeline attribution model, evaluation-stage content structure for AI assistants |
For additional context on how to approach AI search optimization with or without an agency, watch this 2026 SEO starting guide covering the foundational steps before any agency relationship begins. If you're evaluating whether to manage some of this work in-house first, the DIY AEO guide for startups covers the five highest-impact tactics you can execute before hiring an agency.
If you're ready to audit your current AI visibility and evaluate whether your existing content is structured for passage retrieval, book a call and we'll tell you honestly whether we're the right fit. Or start with the free AEO content evaluator to score your highest-priority pages against the CITABLE criteria before any conversation.
FAQs
How long should a discovery call take?
Most discovery calls run 15-30 minutes, though complex enterprise evaluations can extend up to an hour. Use the question list above to structure your time across all six evaluation criteria: pipeline impact, methodology, speed to initial signal, measurement and reporting, commercials, and specialization.
Should I ask for a paid audit first?
Yes, and most credible agencies will offer one. An audit scoped to your priority buyer queries, mapped against competitor AI visibility, gives you a baseline before committing to a retainer. Our AEO Sprint at €6,995 delivers that as a standalone engagement: 10 optimized articles, a full AI visibility audit, answer modeling, and entity mapping with no retainer commitment required.
What if they won't share client names?
NDA constraints are legitimate, especially in competitive markets. What's not acceptable is an unwillingness to share anonymized metrics, methodology details, or references who can speak to the process without naming the company. Ask for anonymized versions of two or three case studies and a reference call with a client who has agreed to speak on background.
How do I know if their case studies are real?
Ask for the attribution methodology in writing. Probe which tools were used to track AI-referred sessions, how pipeline was isolated from other channels, and whether you can speak with someone who managed the engagement. A case study that can't survive those questions is a correlation story, not a causal one.
What's a reasonable timeline to first results?
Initial citations appear within 1-2 weeks of publishing optimized content on priority buyer queries. A measurable citation rate lift, where your share of voice moves enough to show a trend, typically takes 3-4 months of consistent execution. Any agency claiming faster meaningful results without explaining the mechanism should back that claim with a named example.
Score every agency against the same six criteria, ask the same questions in discovery, and request the same proof points before signing. The shortlist that survives that process is the one worth a deeper conversation. You don't need a perfect attribution model on day one. You need a partner who is honest about what they can measure, specific about how they structure content for AI retrieval, and willing to earn the retainer before asking for a long-term commitment.
Key terms
Answer Engine Optimization (AEO): The practice of structuring content so it is retrieved and cited by AI systems such as ChatGPT, Claude, Perplexity, and Google AI Overviews, in addition to ranking in traditional web search.
Citation rate: The share of tracked buyer queries on which your brand appears in an AI-generated answer. Used as the primary measure of AI visibility progress.
Passage extractability: How easily a dense retrieval system can isolate and return a specific section of your content as a standalone answer. Sections structured with a clear answer-first opening and a single focused idea score higher.
Share of voice: The proportion of AI-generated answers on a defined query set where your brand is cited, measured against the same queries answered for named competitors.
Information consistency: The degree to which the same accurate claim about your brand appears across independent sources, including your own site, Reddit, industry publications, and comparison content. LLMs weight consistent cross-source claims when selecting what to cite.