Automate technical SEO audits with Claude Code

Updated May 21, 2026

TL;DR

Claude Code automates the most time-consuming parts of a technical SEO audit: parsing GSC Coverage exports, validating XML sitemaps, detecting broken internal links, and extracting JSON-LD schema from raw HTML.
A builder-validator chain prevents hallucinated markup: Claude first generates JSON-LD from your entity data, then validates it against JSON Schema constraints to catch missing fields, type errors, and nesting breaks before production.
Structured outputs feed board-level reports that translate error counts into pipeline risk language a CFO can evaluate and act on.

Most technical SEO audits die in spreadsheets. Teams spend days filtering Google Search Console exports, checking schema markup page by page, and chasing broken links through crawl reports that run to thousands of rows. Meanwhile, the real work, fixing the issues and building Answer Engine Optimization (AEO) citation eligibility, waits. This guide shows you how to automate that entire workflow using Claude Code, compressing a week of manual analysis into a few hours with machine-readable outputs your team can act on immediately. It is one part of the broader Claude Code for SEO playbook, which covers the full range of SEO and AEO tasks you can automate with Claude Code.

Getting started with Claude Code for technical SEO

We use Claude Code, Anthropic's CLI agent, to connect directly to your file system, codebase, and external tools and run multi-step audit tasks autonomously. For technical SEO, that means feeding raw data exports into a structured pipeline instead of opening them in a spreadsheet.

Here's the practical shift: you feed Claude Code raw CSV, JSON, XML, or TXT exports, define a schema, and get back structured JSON your team can act on immediately. Structured outputs constrain Claude's response to follow a specific schema, ensuring valid, parseable output for downstream processing rather than freeform text someone has to reformat. For B2B SaaS sites with 1,000+ published pages, this changes the audit economics completely.

For the strategic context behind these technical decisions, the AEO vs. SEO breakdown is a useful starting point.

Claude Code for technical SEO audits

Claude Code handles two core audit components: crawlability checks (4xx and 5xx patterns from GSC and sitemap data) and structured data validation (extracting and verifying JSON-LD against Google's schema requirements).

The key capability is bulk processing. Current Claude models (Opus 4.7 and Sonnet 4.6) support context windows up to 1,000,000 tokens, meaning a typical large GSC export can be processed within a single session, with older models like Sonnet 4.5 limited to 200,000 tokens.

However, an Ahrefs study tracking 1,885 pages that added schema found small to no citation lift across Google AI Overviews, AI Mode, and ChatGPT. Separately, a December 2025 study by searchVIU, a German technical SEO agency, found that none of five major AI systems (ChatGPT, Claude, Perplexity, Gemini, and Google AI Mode) used schema markup when fetching pages in real-time, instead extracting only visible HTML content. While schema may support Google's rich results, auditing structured data at scale remains important for traditional search visibility rather than direct AI citation impact.

Scaling audits: auto vs. human

The time difference between manual and automated auditing is not marginal. Here is what a comprehensive audit looks like across both approaches for a B2B SaaS site with 1,000+ published pages:

Task	Traditional manual method	Claude Code automated method
Finding all 404s in GSC Coverage export	Hours of manual filtering through export limits	Minutes with CSV parsing and pattern detection
Validating schema on 100 product pages	Manual Rich Results Test per page	Bulk extraction and validation
Identifying canonical tag conflicts	Hours of crawl export review and spreadsheet work	Rapid JSON parsing and relationship mapping
Full comprehensive audit	Days to weeks (typical)	Same day completion

The gap compounds with site size. A human reviewing a 5,000-row crawl export misses patterns that a systematic parser catches immediately, particularly canonical conflicts across URL parameters or schema errors confined to specific page templates.

Building your Claude Code audit workflow

You'll build a six-stage audit pipeline where each stage produces structured JSON that feeds the next. Raw data enters at one end and a prioritised action plan exits at the other.

Automated SEO audit flow diagram

[Data Sources]
     |
     ├── GSC Coverage CSV export
     ├── GSC Performance CSV export
     ├── XML Sitemap (index + child sitemaps)
     └── Crawl JSON (Screaming Frog or similar)
     |
     v
[Stage 1: Parse & Structure]
     Claude Code ingests raw files, returns structured JSON
     |
     v
[Stage 2: Validate]
     Builder-validator chain checks schema, status codes, canonicals
     |
     v
[Stage 3: Analyze]
     Pattern detection: 4xx clusters, orphaned pages, schema errors
     |
     v
[Stage 4: Prioritize]
     Rank issues by pipeline impact, crawl budget waste, citation risk
     |
     v
[Stage 5: Translate]
     Convert technical findings to CMO-readable business impact
     |
     v
[Stage 6: Report]
     Executive summary JSON exported to board-level reporting format

Data sources for AI audits

You need four inputs to run a complete audit. All are available without paid tools, though crawl depth improves with dedicated crawlers like Screaming Frog (a desktop crawler that extracts technical SEO data from websites).

GSC Coverage report: All tabs (Valid, Valid with warnings, Excluded, Error) exported as CSV.
GSC Performance report: Last 90 days, all queries and pages, exported as CSV.
XML sitemap: The index file URL plus all child sitemaps for multi-section sites.
Crawl export: CSV format from your crawler, containing source URL, internal links, and HTTP status codes.

Claude Code's capabilities may also support programmatic API connections, so you can pull GSC data via the Search Console API for recurring audits rather than relying on manual exports. The Anthropic Cookbook has working examples of agentic loops and tool use patterns you can adapt for this. For the strategic context, this 2026 SEO walkthrough covers how technical decisions connect to AI visibility outcomes.

Uncovering technical SEO issues via GSC

GSC data is the most direct signal of how Google sees your site. The Coverage report shows which pages are indexed, excluded, or throwing errors. The Performance report shows which queries and pages drive clicks. Together, they answer two core questions: can Google find and index the right pages, and are those pages earning clicks?

Export GSC for Claude Code audits

The GSC Coverage report typically gives you URL statuses across tabs such as Valid, Valid with warnings, Excluded, and Error. Export each tab separately as CSV. Note that some values shown in the report interface may export differently in the downloaded file, so verify zero-click pages rather than assuming they have no visibility.

For large sites, GSC's interface caps individual exports at a small number of rows. Use the Search Console API with date-range pagination to pull full datasets without hitting this cap. The Google Search Console API documentation covers the authentication steps if this is your first time pulling data programmatically.

Export the Performance report with a sufficient date range (such as 90 days), selecting queries, pages, devices, and countries. This gives Claude enough data to detect seasonal patterns in error rates alongside structural issues.

Claude Code for GSC data parsing

This prompt parses the Coverage CSV and returns a structured JSON summary of error patterns grouped by directory. The key is defining the exact output structure upfront so Claude returns parseable JSON with no reformatting required.

You are a technical SEO analyst. I will provide a Google Search Console Coverage CSV export.
Your task:
1. Parse the CSV file
2. Identify all URLs with "Submitted URL not found" (404) or server error status
3. Group URLs by directory path (e.g., /blog, /products, /docs)
4. Count occurrences per group
5. Output as JSON:
{
  "error_summary": {
    "total_errors": number,
    "by_status": { "404": count, "5xx": count },
    "by_directory": { "/blog": count, "/products": count }
  },
  "top_affected_directories": [
    { "path": "/blog", "count": 42, "sample_urls": ["url1", "url2"] }
  ]
}

Focus on directories with the highest error counts. Return only the JSON object, no markdown.

JSON Schema validation ensures data integrity and prevents downstream errors in your reporting pipeline. The structured output from this prompt feeds directly into Stage 4 (prioritisation) of the workflow above.

Identifying crawl and indexing issues

Claude identifies three patterns that are invisible at scale in a spreadsheet: 404 clusters pointing to a structural problem (a deleted category, a changed URL pattern) rather than individual broken pages, 500 errors concentrated on specific server paths indicating backend issues, and canonical tag conflicts where the submitted URL differs from the declared canonical in the page header.

When dozens of URLs in a specific directory like /docs/ all return 404, that typically indicates a template or redirect configuration problem rather than individual broken links. Claude surfaces the pattern rather than listing the symptoms. The AI citation strategy guide covers how indexation problems directly reduce citation rate across AI engines.

Ensuring sitemap accuracy with Claude Code

Your sitemap is a contract with Googlebot: these are the pages you want crawled and indexed. When that contract includes non-200 URLs, stale timestamps, or pages that contradict their own canonical tags, you waste crawl budget and send conflicting signals to both Google and LLMs that index the open web.

Fetch and parse XML sitemaps

Claude Code fetches an XML sitemap directly via its web fetch server tool. For large sites with sitemap index files pointing to multiple child sitemaps, start with the index and iterate through each child. Process each child sitemap in sequence and aggregate the JSON outputs: this approach supports the CITABLE framework's "Latest and consistent" component, ensuring timestamps and URL structures cohere across all surfaces.

Automating sitemap reviews with Claude

The sitemap audit prompt fetches your sitemap, checks HTTP status codes for each URL, and returns a structured breakdown. It instructs Claude to extract all <loc> values, make HEAD requests for each, and return a JSON object with status_breakdown and non_2xx_urls (including redirect targets for 301s). Flag any URLs with status >= 400 as critical. The output keys are: sitemap_url, total_urls_checked, status_breakdown, and non_2xx_urls.

Critical sitemap audit findings

Three findings appear repeatedly on enterprise B2B SaaS audits, each with a direct impact on AI citation rates.

Orphaned pages: URLs present in your sitemap but not discoverable through internal links during a crawl. These pages exist but receive no crawl priority from internal linking, so Google and LLMs may deprioritize or never retrieve them.
Mismatched canonicals: Sitemap <loc> values that may differ from the declared canonical tag in the page header. This can cause indexation confusion by presenting Google with two competing signals for the same content.
Stale lastmod timestamps: Pages with frequently changing content but unupdated lastmod dates. This may signal crawl scheduling inefficiency, as search engines can use this data to inform crawl priority.
The Google AI Overviews guide covers the connection between these technical signals and AI citation eligibility.

Rapidly finding internal link errors at scale

Internal links distribute page authority and signal content relationships to both Google and LLMs building knowledge graphs. Broken internal links and redirect chains waste link equity and reduce the signal density that AI systems use to understand your site's entity structure.

Structuring data for link audits

In Screaming Frog, export the "Internal All" report in CSV format with these fields: source_url, destination_url, status_code, and link_count. Claude parses this structure directly without reformatting. Machine-readable CSV is critical for agentic workflows because downstream processes consume the output without human intervention, which is the same principle behind the CITABLE framework's "Block-structured for RAG" component.

Claude prompts for broken links

The internal link audit prompt iterates through all links, typically filtering for status codes indicating errors (such as 400+), and groups results by destination URL to show which broken targets have the most widespread impact. The key output keys are: broken_links_summary (with total_broken and by_status), and broken_destinations (with broken_url, status_code, linked_from_count, and sample_sources), sorted by linked_from_count descending. All prompts in this pipeline specify JSON-only output with no markdown formatting to ensure parseable results.

Identifying high-value link repairs

Prioritise fixes using two signals: linked_from_count (how many pages link to this broken destination) and the nature of the source pages (product pages and conversion paths rank higher than informational content). A 404 on a destination linked from many product pages and case studies represents a pipeline risk. A 404 linked from a few blog posts with minimal traffic is lower priority but still worth scheduling.

Redirect chains are a secondary issue worth flagging. Any internal link that passes through two or more redirects can dilute link equity at each hop. Claude identifies these chains from the 3xx entries in the crawl export and flags which to collapse to direct links.

Validating structured data across pages

Schema markup makes explicit what your pages are about. Google uses it to generate rich results, and LLMs use it to build knowledge graphs during training and retrieval. Google's structured data documentation is the definitive reference for what Google Search requires, not schema.org directly, so validate against Google's guidance rather than the vocabulary spec alone.

Extract schema markup from HTML

The schema extraction prompt finds all <script type="application/ld+json"> tags, extracts the JSON content, parses each block for validity, and returns a JSON array with total_schemas_found and a schemas array containing type, content, and is_valid for each block. Run it against a sample of pages in sequence, aggregate the results, and you have a complete picture of schema coverage and error rate across your site.

For context on how schema coverage connects to citation rates, our AI citation research covers the relationship between structured data and citation rates.

Automate schema validation with Claude

The builder-validator chain prevents hallucinated schema markup through two phases. First, Claude generates JSON-LD markup from your entity data (product name, price, rating, URL). Second, structured outputs constrain the response to a specific JSON Schema, catching missing required fields, incorrect types, and nesting errors before the markup reaches production.

LLMs include self-verification mechanisms that detect and correct errors in tasks requiring precise detail. By enforcing structure at the output level, you get reliable markup at scale rather than spot-checking 100 pages manually. The schema.org getting started guide covers the vocabulary layer, but always cross-reference with Google Search Central for which properties Google actually surfaces.

Common schema errors and fixes

Three error types appear frequently on B2B SaaS schema audits:

Missing required fields: Required properties must be included for a page to be eligible for enhanced display in Google Search. Product schema without offers.price is a common example. The builder-validator chain either fills the gap from available page data or flags it for manual input.
Incorrect data types: Date fields receiving "May 2026" instead of structured formats like ISO 8601 ("2026-05-18"), or enum fields with values outside the allowed set. JSON Schema type validation catches these automatically.
Incomplete entity nesting: Organization, Product, and Article schema working as isolated blocks rather than connected entities. Pages using linked schema types can provide richer structured data that feeds knowledge graphs LLMs query during retrieval.

Streamline audit reporting for board reviews

Technical findings are only useful if they reach the people who can prioritise the fixes. A JSON file with hundreds of errors means nothing to a CFO. The same data, translated into pipeline risk and crawl budget waste, becomes a decision-making tool.

Automate audit data consolidation

After running the GSC, sitemap, internal link, and schema prompts, you have four JSON files. The consolidation step merges them into a single audit object grouped by severity (such as critical, high, medium) and by impact type (indexation risk, citation risk, user experience). The aggregated JSON becomes the input for the translation prompt below, and the output is an executive summary your team can take directly into a board slide.

Automate audit summaries with Claude

The translation prompt converts each technical finding into a concise business impact statement. For each finding, it generates a business_impact field explaining what is happening in plain English, why it matters to revenue or pipeline, and an estimated business impact. Output keys are: technical_issue, business_impact, revenue_risk (critical/high/medium), and priority. Group by revenue_risk and present critical items first with estimated pipeline impact.

Defensible audit reports for CFOs

The output gives you a clear structure: what is broken, what it costs in pipeline, and how fast it can be fixed. Attribution remains a harder problem, and we acknowledge that GA4, HubSpot, CRM, and self-reported data give different answers for the same pipeline question. The technical audit provides the baseline. Pages that cannot be indexed are unlikely to be cited, and pages that are not cited face challenges driving AI-referred pipeline. The noise vs. signal guide covers how to build rigorous measurement on top of this technical foundation.

"Before Discovered Labs, we were using homegrown LLM prompts, without a clear strategy for what to optimize for or exactly how best to structure content." - Tom Wentworth, CMO at incident.io

B2B SaaS audit: Claude Code for AI visibility

The prompts above give any technical SEO team a working audit pipeline. Where the approach scales is in connecting the technical findings to AI visibility specifically: mapping which errors reduce citation eligibility across Google AI Overviews, ChatGPT, Claude, Perplexity, and Gemini.

Setting up your technical SEO audit

At Discovered Labs, our AI visibility tracker uses this audit pipeline approach. The setup maps three inputs: the client's query map (what buyers actually search), the current citation rate per query across each AI engine, and the technical baseline (indexation, schema coverage, internal link health). Each technical fix is prioritised by its expected lift in citation rate, not by its technical severity in isolation.

This is the practical difference between auditing for Google rankings and auditing for AI visibility. A canonical conflict on a blog post may be low priority for organic ranking but high priority if that post is the only page covering a buyer query where competitors are consistently cited. For the strategic layer above this technical work, this B2B SaaS AI search guide covers how to connect technical fixes to share of voice outcomes.

Prioritizing technical audit fixes

At Discovered Labs, we prioritise using axes such as citation impact (does fixing this issue increase the likelihood of being cited on a high-value buyer query?) and fix effort (is this a one-line schema fix or a site architecture change?). High-priority fixes typically combine high citation impact with low effort.

Broken internal links and missing schema remove content signals before LLMs can retrieve them. The CITABLE framework provides the content layer above this technical foundation. Specifically, the "Block-structured for RAG" and "Entity graph and schema" components both depend on a clean technical baseline. For a quick self-assessment, the AEO content evaluator scores existing content against CITABLE components for free.

Automating spreadsheet audits with Claude

The shift from spreadsheets to structured JSON compounds over time. A spreadsheet requires ongoing human filtering and interpretation. A JSON pipeline runs on schedule, outputs consistent structure, and feeds your reporting dashboard without reformatting. For teams running regular audits, this redirects analyst hours from data formatting toward actual visibility work.

Our pricing plans include continuous visibility tracking and competitor monitoring built on this automated pipeline, and all retainers are month-to-month so there is no lock-in while you validate results.

For teams building this capability in-house before committing to an agency, the DIY AEO tactics guide covers which parts to prioritize first.

The most expensive part of most SEO retainers is manual data formatting. Claude Code removes that cost and redirects hours toward the work that actually moves citation rate: optimising content for passage retrieval, building information consistency across sources, and fixing the technical gaps that block AI systems from retrieving your pages.

If you want to see how this audit pipeline connects to measurable AI-referred pipeline for a B2B SaaS site, book a call and we will tell you honestly whether the approach fits your situation. Pricing is public, retainers are month-to-month, and the audit scope covers multiple surfaces: web search, AI citations, and training data visibility.

FAQs

How long does it take to audit an enterprise SaaS site using Claude Code?

A comprehensive audit covering GSC errors, sitemap validation, internal links, and schema can be completed in a few hours using Claude Code compared to days or weeks manually for a site with 1,000+ published pages. Setup time for the prompt pipeline varies based on your existing workflow.

What file formats does Claude Code accept?

Claude Code accepts CSV, JSON, XML, TXT, and PDF for direct file uploads, with inline images up to 8 MB per file and larger files up to 500 MB via the Files API. It also connects to live data sources via server tools, which means you can pull GSC data programmatically rather than relying on manual CSV exports.

How often should automated technical SEO audits run?

Enterprise audit frequency recommendations suggest quarterly audits for sites with frequent content updates, plus before any migration or redesign regardless of schedule. For sites with 100-500 pages publishing regular content, audits every six months are typically sufficient. For larger or highly active sites, quarterly audits help catch issues before they compound.

How does the builder-validator chain prevent schema validation errors?

The builder generates JSON-LD from entity data, and the validator uses structured outputs to constrain the response to a specific JSON Schema, catching missing fields, type errors, and nesting breaks before production. This eliminates hallucination risk in character-precise structured data without manual spot-checking.

Key terms glossary

Answer Engine Optimisation (AEO): The practice of structuring content so AI systems cite it when answering user queries, as distinct from traditional SEO ranking signals.

Builder-validator chain: A two-phase Claude Code workflow where one prompt generates structured markup such as JSON-LD and a second validates it against a JSON Schema to catch errors before production.

Citation rate: The frequency with which an AI engine references a specific page or brand when answering a relevant query.

GSC (Google Search Console): Google's free tool for monitoring how a site performs in Google Search, covering indexation status, crawl errors, and query-level traffic data.

JSON-LD: A structured data format embedded in a <script> tag that communicates entity relationships such as product, organisation, or article to search engines and LLMs.

RAG (Retrieval-Augmented Generation): An LLM architecture that retrieves relevant passages from an external knowledge base before generating an answer, making passage structure and extractability a direct input to citation decisions.

Schema markup: Structured data vocabulary from schema.org, implemented as JSON-LD, that describes the entities and relationships on a page to search engines and AI systems.

Structured outputs: A Claude API feature that constrains model responses to a specific JSON Schema, ensuring parseable, consistent output for downstream processing.