article

Internal linking at scale with Claude Code

Automate internal linking with Claude Code to build topic clusters for AI passage retrieval without manual spreadsheet work. This guide covers a five step workflow to map semantic relationships across hundreds of pages, score candidate links by relevance, and export CMS ready output.

Liam Dunne
Liam Dunne
Growth marketer and B2B demand specialist with expertise in AI search optimisation - I've worked with 50+ firms, scaled some to 8-figure ARR, and managed $400k+/mo budgets.
May 21, 2026
14 mins

Updated May 21, 2026

TL;DR:

  • Manual linking can't scale: Content volume beyond 100–150 pages makes manual internal linking impractical, leaving orphaned pages and missing entity relationships that AI retrievers need.
  • Claude Code automates the process: Treat your blog as a programmatic database, map semantic relationships across hundreds of pages, score candidate links by relevance, and export CMS-ready output in a fraction of the time.
  • Five-step workflow: Prepare a clean sitemap CSV, define your linking playbook, run batch candidate identification, validate suggestions manually, then push to your CMS.
  • Biggest pitfall is raw HTML: Feeding Claude raw HTML causes it to extract structured-data metadata as if it were visible article content, producing hallucinated link suggestions.
  • Use clean rendered text: Use a scraping service that returns clean rendered text before passing pages to Claude to avoid metadata extraction issues.

Most content teams hit a ceiling around 100 to 150 published pages where manual internal linking stops being practical. New articles go live without anyone updating existing pages to point to them, orphaned pages accumulate, and the entity relationships that AI retrievers need to cite your content never get established. This guide covers one workflow from the broader Claude Code for SEO playbook: how to map semantic relationships across a large content library, score candidate links by relevance, and produce CMS-ready output without the spreadsheet overhead.

Internal links do three things for AI passage retrieval: they signal page discovery paths for crawlers, encode entity relationships in your content graph, and pass topical relevance from pillar pages to cluster content. All three matter for citation rate.

Traditional SEO prioritized link equity as a ranking signal. For LLM-based retrieval, the mechanism shifts. Dense retrievers don't follow PageRank logic. They score individual passages by semantic alignment between query intent and passage meaning, so the question becomes whether your linked pages form a coherent topic cluster rather than whether any single page accumulates authority.

Organic search is evolving to include multiple surfaces: traditional web search, AI citations, and training data. Internal links affect all three. For web search, they support crawlability and page authority distribution. For AI citations, they tell dense retrievers that your pillar page and its cluster pages are semantically related. For training data, they help establish brand associations at the entity level.

The underlying technology difference is what makes this matter. Traditional search models calculate relevance using term frequency and inverse document frequency. Dense retrievers, by contrast, are reported to map both queries and passages into continuous vector embeddings that capture semantic similarity beyond lexical overlap. A page buried in your site with no internal links may rank in Google but could remain less visible to passage retrieval because the model lacks semantic context for it.

Our CITABLE framework treats internal links as part of the "E" component, entity graph and schema, because explicit relationships in copy signal more to LLMs than structural markup alone. A well-connected topic cluster gives retrievers multiple entry points into your content, increasing the probability that a relevant passage gets extracted and cited.

A topic cluster groups a pillar page covering a broad concept with cluster pages that each answer one specific related question and link back to the pillar. For AI retrieval, this architecture matters because it distributes semantic authority consistently rather than concentrating it on a single page. A well-linked cluster gives retrievers multiple entry points into your content, increasing the probability that a relevant passage gets extracted and cited. Google's AGREE research demonstrates that LLMs can be adapted for improved grounding.

The practical problem is scale. A 200-page blog has numerous potential link pairs to evaluate. Evaluating them manually, deciding on anchor text, and updating the CMS by hand is a significant investment for any lean team. Claude Code compresses that work substantially and makes it repeatable.

Claude Code is Anthropic's agentic CLI tool designed to run multi-step automated workflows against files, APIs, and web content using natural-language instructions combined with structured code execution. For internal linking, it can accept your sitemap CSV as input, apply semantic matching across page content, score candidate pairs by relevance, and output structured JSON or HTML ready for your CMS import.

The workflow integrates with existing marketing stacks. You can pipe Claude's JSON output directly into a headless CMS API like Contentful or Strapi, tag AI-referred sessions in GA4 via UTM parameters on links Claude identifies as high-priority, and track citation rate changes through our AI visibility tracker to close the loop on pipeline attribution.

Preparing your blog for internal linking

Start with your sitemap. Export a CSV with columns for URL, page title, and any custom fields that help you categorize content by topic cluster or pillar category. If your CMS doesn't export this directly, a simple crawler pass will generate it. Flag any pages with zero inbound internal links as orphaned, because these are your highest-priority targets.

The second input is a linking playbook, a Markdown file or a second CSV that defines your rules. Specify which pillar pages should link to which cluster categories, anchor text length guidelines (concise, descriptive phrases work well), and any pages that should not link to each other to avoid diluting topical focus. Without this playbook, Claude will generate semantically plausible suggestions that contradict your content strategy.

Our AEO content evaluator lets you score individual pages against CITABLE criteria before you run the batch, so you're not building link equity into pages that aren't structured for passage retrieval in the first place.

With your CSV and playbook ready, Claude Code processes your sitemap using batch prompts. It identifies the primary entity of each page, finds candidate pages that share overlapping semantic territory, evaluates relevance based on semantic alignment between page content, and generates descriptive anchor text for each candidate pair. Output is JSON with fields for source URL, target URL, anchor text, relevance indicators, and a short context snippet showing where in the source page the link belongs.

The AI citation strategy guide covers how to structure content so individual sections can be extracted as standalone passages. A strong internal link map is the structural version of that same principle: it ensures readers and retrievers can always find the adjacent question answered somewhere in your cluster.

Finding hidden content opportunities

A batch audit frequently surfaces three classes of problem that manual review misses. First, orphaned BOFU pages that rank in Google but receive no internal traffic from cluster content, making them invisible to AI passage retrieval. Second, over-linked pillar pages where dozens of unrelated cluster pages link to the same target, diluting the semantic signal for each individual link. Third, broken or stale links pointing to redirected URLs that pass no entity signal at all.

Our what drives AI citations research, based on extensive analysis of AI citations across thousands of pages, identifies orphaned comparison pages and pricing guides as content types that often rank organically but fail to appear in AI answers. Fixing their connectivity is typically the fastest lever for improving citation rate on high-intent queries.

Claude Code: your internal linking strategy

Claude Code is guided automation, not fully autonomous execution. Human review is a required step, not an optional one. The tool generates candidates based on semantic similarity, but it doesn't know your product roadmap, your competitive positioning, or which pages you're currently de-emphasizing. Those judgment calls stay with your team.

Here's the five-step implementation workflow:

  1. Prepare your data: Export your sitemap as a CSV with URL, title, and any topic or cluster category fields your CMS provides. Run an inbound link audit to tag orphaned pages.
  2. Define your playbook: Write a linking rulebook in Markdown or CSV specifying pillar-to-cluster relationships, anchor text length, and exclusion rules.
  3. Run candidate identification: Execute batch prompts against your CSV. Review the JSON output, checking relevance indicators and context snippets for each suggested pair.
  4. Validate manually: Approve suggestions where the linked page genuinely helps a reader understand the source topic. Reject or modify anchor text that sounds mechanical or over-optimized.
  5. Export and deploy: Generate your CMS-compatible file. Bulk-import via your CMS API. Validate live links and monitor citation rate changes over the following weeks.

The workflow described in this guide is a collection of prompts and automation patterns your team can adapt and invoke repeatedly. It encodes your specific content architecture, pillar definitions, and anchor text guidelines into reusable Claude Code prompts. You'll build your own version using the templates provided later in this article.

Rather than running one-off prompts, your team works from a named set of instructions that already understands your linking rules. Store the prompts in a sharedCLAUDE.md project file so everyone on the team works from the same instructions, removing the context-rebuilding overhead from every batch run.

The diagram below shows how inputs flow from your sitemap and playbook through candidate identification to CMS-ready output.

┌──────────────────────────────────────────────────────┐
│               INTERNAL LINKING AUTOMATION             │
└──────────────────────────────────────────────────────┘

  ┌───────────────────┐
  │ INPUT             │
  │ CSV Link Rules    │
  │ + Sitemap         │
  │ + Playbook.md     │
  └────────┬──────────┘
           │ (1) Data Preparation
           ▼
  ┌───────────────────────────┐
  │ Claude Code /batch-match  │
  │ • Candidate Discovery     │
  │ • Semantic Scoring        │
  │ • Anchor Text Generation  │
  └────────┬──────────────────┘
           │ (2) Batch Analysis → JSON
           ▼
  ┌───────────────────────────┐
  │ HUMAN REVIEW              │
  │ • Validate Relevance      │
  │ • Confirm Context         │
  │ • Flag Over-Linking       │
  └────────┬──────────────────┘
           │ (3) Approval Filtering
           ▼
  ┌───────────────────────────────┐
  │ OUTPUT: CMS-Ready Links       │
  │ • URL pairs                   │
  │ • Anchor text                 │
  │ • Placement metadata          │
  └────────┬──────────────────────┘
           │ (4) Bulk CMS Import
           ▼
  ┌───────────────────────────────┐
  │ RESULT                        │
  │ ✓ Links Published             │
  │ ✓ Citation Rate Improving     │
  │ ✓ Entity Authority Growing    │
  └───────────────────────────────┘

The technical flow above maps to the five operational steps outlined earlier. Data preparation covers your sitemap CSV and playbook inputs. The /batch-match pass handles candidate identification and scoring. Human review filters suggestions before the final CMS export and deployment.

The comparison below shows how Claude Code's guided automation stacks up against manual spreadsheet work and other AI-assisted suggestion tools.

Criterion

Manual spreadsheets

Other AI tools

Claude Code

Automation depth

None, copy-paste only

Suggestions only

Batch processing via CSV/JSON

Semantic awareness

Manual judgment

Limited, keyword-based

Full context window, topic cluster-aware

Batch processing

No

No

Yes, handles large URL sets

CMS integration

Manual copy-paste

Limited to specific platforms

Flexible JSON/HTML, API-ready

Strategic depth

Depends on user

Low, no playbook alignment

High, integrates custom playbook rules

Optimizing anchor text for passage retrieval

Anchor text is a direct semantic signal for dense retrievers. When a retriever encodes a passage for similarity scoring, it weighs the anchor text, the one to two surrounding sentences, and the target page's opening paragraphs as a combined signal. When all three point to the same concept in vector space, the passage scores highly for queries about that concept.

Generic anchor text like "click here" or "learn more" creates vector misalignment. The retriever encodes those phrases into a semantic space unrelated to your target page's primary entity, reducing the likelihood that the passage gets extracted in response to a relevant query. Research on Dense Passage Retrieval (DPR) demonstrates that dense encoders can outperform sparse models on passage retrieval tasks, with that advantage depending on semantic coherence throughout the text, including anchor text. Claude Code's anchor generation follows three rules aligned with this: use concise phrases that name the target entity directly, place the anchor inside a sentence that independently establishes why the linked page is relevant, and vary phrasing across the cluster so repetition doesn't read as manipulative to search systems or human readers.

Context for AI passage scoring

The sentence immediately before and after an internal link contributes significantly to passage scoring. If your anchor text says "incident response playbook" but the surrounding context is about billing and subscriptions, the dense retriever may detect the semantic mismatch and discount the signal.

Write the linking sentence to make the relevance explicit. "To build out the incident response workflow we cover in the playbook, you need a notification strategy" is a strong context sentence. "See also: incident response playbook" gives the retriever almost nothing to work with. The context sentence is also what a reader sees, so improving it for AI passage scoring simultaneously improves readability.

Preventing anchor text overuse

Excessive exact-match anchor text hurts both traditional SEO and AEO for the same underlying reason: it signals optimization rather than genuine editorial judgment. High link density on a page can signal low topical specificity to dense retrievers, suggesting a hub page rather than a focused answer, which reduces its citation probability for narrow, high-intent queries.

Vary anchor phrasing across similar links, and audit for redundant links that point to the same target using near-identical anchor text. This constraint belongs in your playbook file so Claude Code applies it during candidate generation rather than requiring a separate cleanup pass.

Four copy-paste prompts for batch linking

The following prompts are designed to be stored in a shared Claude Code project file and invoked as illustrative slash commands for this workflow. Adjust the playbook variables and CMS-specific requirements to match your own setup.

Analyze this webpage [URL] for internal linking opportunities.
Identify:
1. The primary entity/topic of the page
2. Semantically related topics within our content cluster
3. Specific passages where internal links add contextual value
4. Recommended anchor text for each link (2-4 words, descriptive)

Note: Replace [URL] with the actual page URL you want to analyze.

Output as a structured table: [Source passage] | [Target page] | [Anchor text]
Process this CSV of URLs: [attach file with columns: page_url, pillar_topic, cluster_category]

For each URL:
1. Identify orphaned or under-linked passages
2. Find candidate pages to link to from the sitemap
3. Evaluate candidate relevance based on semantic alignment
4. Return JSON with fields: source_url, target_url, anchor_text,
   relevance_indicator, context_snippet

Note: Attach your CSV file in place of [attach file].

Prompt 3: structured output for automation

Convert these internal linking recommendations into structured format for CMS import.

Requirements:
- Preserve HTML heading structure
- Include placement metadata: paragraph_number, character_position
- Add fields: anchor_text, target_page_id, relevance_indicator
- Validate all target URLs exist in the attached sitemap
- Output a file ready for bulk CMS API import

Note: Specify your desired format (JSON/HTML) and CMS name when running this prompt.
Within this text block, identify phrases that could serve as internal link anchors:
[Paste article excerpt]

For each opportunity:
1. Extract the phrase to use as anchor text
2. Suggest target pages that are contextually relevant
3. Explain why the reader would benefit from that link
4. Flag any links that might be redundant or distract from
   the article's primary argument

Note: Replace [Paste article excerpt] with the actual text you want to analyze.

Output as a bulleted list with reasoning for each suggestion

One anonymized B2B SaaS client came to us with a content library of 200+ published pages. Their BOFU pages, comparison articles, pricing guides, and ROI calculators, ranked in Google but were absent from AI Overviews and LLM responses.

We ran the full internal linking workflow: batch audited the site, identified dozens of orphaned BOFU pages, generated hundreds of linking candidates from cluster content back to those pages, reviewed for relevance, and deployed through the CMS API. AI-referred trials jumped from 550 to 3,500+ in 7 weeks, as documented in our case studies, a directional outcome driven in large part by improving entity context around pages that were already ranking but not being cited.

The audit pass using /link-auditbatch prompts on each BOFU page returned a clear picture within hours. Each page's primary entity was extracted, compared against the full sitemap, and scored against candidate pages from the cluster. Pages published in isolation, with no mentions or links from related content, showed up immediately as orphaned.

The same isolation pattern appears consistently across client audits. Pages with strong content but weak entity connections sit invisible to AI retrieval until something in the cluster points to them. The Google AI Overviews optimization guide covers how AI Overviews select passages for extraction and why structurally isolated pages get skipped regardless of standalone quality.

Pinpointing internal linking gaps

Orphaned pages are a predictable outcome of publishing-first content operations, where new articles get written without updating existing pages to link to them. Over time, the content library grows but the link graph doesn't, creating pages with strong content but weak entity connections. Claude Code's batch audit surfaces all of them in a single CSV export, which a lean team can prioritize by pipeline impact: protect existing bottom-of-funnel (BOFU) content, grow middle-of-funnel (MOFU) reach, then expand top-of-funnel (TOFU) coverage.

B2B SaaS linking: ROI and metrics

The operational efficiency gain from automating this workflow is significant for lean teams. Running batch candidate identification across a 200-page site, including a human review pass, takes a fraction of the time a fully manual audit would require, and it produces a structured, CMS-ready output rather than a spreadsheet that still requires hours of copy-paste work. For teams with limited resources, the difference between a systematic programmatic workflow and a manual one is often the difference between the work getting done this quarter or not at all.

On the measurement side, track three metrics after deployment: citation rate on target pages using an AI visibility tracker, AI-referred sessions in GA4 tagged via UTM on links you've optimized, and qualified pipeline attributed to AI-referred sessions in your CRM. Initial signals may appear within weeks, though meaningful citation rate changes typically take longer to materialize. Our AI visibility tracker monitors citation rate changes at the page level, so you can confirm whether link additions are actually improving retrieval before scaling the same pattern across your full site.

Automating internal links at scale introduces failure modes that don't exist in manual workflows. The most significant is LLM-based content extraction that returns structured-data metadata as if it were visible article text, and it's easy to miss if you're not looking for it.

LLM extraction returning structured-data metadata

When Claude Code analyzes raw HTML, it may encounter <script type="application/ld+json"> blocks containing schema markup, meta description tags, Open Graph fields, and other metadata. LLMs can potentially misinterpret hidden markup as visible content. As a result, they might generate link suggestions based on structured data that no human reader ever sees.

The result is hallucinated anchor text anchored to schema property values rather than visible sentences, and relevance scores inflated by metadata keywords that match your query but don't reflect actual page content.

The mitigation is straightforward: use a scraping service that renders the page in a headless browser, extracts only the visible DOM text, strips <script>, <style>, and <meta> tags, and returns clean plain text or semantic HTML before you pass the content to Claude. The AI tracking platforms test flaw post covers a related measurement issue where visibility tools were pulling from metadata rather than rendered content, producing inflated numbers. The same root cause applies here.

Clean HTML means the heading hierarchy is intact, paragraph tags wrap visible body text, and no script or style content is included. For pages with heavy JavaScript rendering, a headless browser pass is typically the most reliable way to get the content Claude will actually evaluate, though simpler extraction methods may suffice for static sites with content present in the initial HTML response.

Pitfalls of excessive linking

A 1,000-word page with fifty or more internal links can result from automated linking without playbook constraints. The semantic signal for each individual link weakens as density increases. Dense retrievers may evaluate a page's topical focus through multiple signals, including how entities are distributed across linked content. A page linking to five closely related concepts sends a clear signal about its core topic. A page linking to forty or fifty tangentially related pages sends conflicting signals that reduce its citation probability for narrow, high-intent queries.

Set a maximum link count in your playbook file so Claude Code applies the constraint during candidate generation rather than requiring a separate cleanup pass. Pages approaching the upper threshold should be reviewed for redundant targets, where two or more links from the same page point to the same destination URL, and for tangentially related pages that don't genuinely serve a reader moving through your topic cluster.

Most of what makes a page citable is already there. The internal link graph is what tells retrievers to find it.

If you want to implement the full CITABLE framework alongside programmatic linking, our Starter retainer covers up to 20 CITABLE-framework articles per month, visibility tracking, off-page consistency, and a dedicated team on a month-to-month basis. Book a call and we'll tell you honestly whether we're a fit.

FAQs

Keep links to a moderate level, typically several dozen at most on long-form pages. Pages with high link density dilute the semantic signal for each individual link, reducing citation probability for specific high-intent queries. Your playbook file should define an upper threshold so Claude Code applies the constraint during candidate generation rather than requiring a manual cleanup pass afterward.

Can Claude Code automate internal linking in a headless CMS?

Yes. Claude Code exports structured JSON containing source URL, target URL, anchor text, and placement metadata, which your CMS API consumes directly. The content analysis layer and the publishing layer remain decoupled, so updates deploy without manual CMS access.

Run a link audit promptly after publishing any new cluster page, and run a full batch pass on your entire sitemap regularly (quarterly is a reasonable starting cadence) to capture new opportunities from content changes, competitive shifts, and seasonal query patterns.

Key terms glossary

Dense retrieval: A search method that converts queries and passages into continuous vector embeddings (mathematical representations that capture meaning), enabling semantic similarity matching beyond exact keyword overlap. It is the underlying technology that AI answer engines use to select which passages to cite in generated responses.

Entity graph: A structured map of how topics, products, and concepts relate to each other across your content library. Internal links function as the edges in this graph, signaling to dense retrievers how pages relate within a topic cluster.

Link equity: The authority and topical relevance value that passes from one page to another through a hyperlink. In AEO contexts, link equity is less about raw domain authority and more about the semantic context and entity relevance the link carries for retrieval scoring.

Passage retrieval: The process by which a dense retriever extracts individual text blocks from indexed pages, scores them for alignment with a user query, and supplies them as context for an LLM to generate an answer. Pages structured with clear sections and descriptive anchor text are cited more consistently than pages optimized only for full-document ranking.

Continue Reading

Discover more insights on AI search optimization

Jan 23, 2026

How Google AI Overviews works

Google AI Overviews does not use top-ranking organic results. Our analysis reveals a completely separate retrieval system that extracts individual passages, scores them for relevance & decides whether to cite them.

Read article