The Technical SEO Audit Needs A New Layer

The standard technical SEO audit checks crawlability, indexability, website speed, mobile-friendliness, and structured data. That checklist was designed for one consumer: Googlebot.

This is how it’s always been.

In 2026, your website has, at least, a dozen additional non-human consumers. AI crawlers like GPTBot, ClaudeBot, and PerplexityBot train models and power AI search results. User-triggered agents like the newly announced Google-Agent, or its “siblings” Claude-User and ChatGPT-User, browse websites on behalf of specific humans in real time. A Q1 2026 analysis across Cloudflare’s network found that 30.6% of all web traffic now comes from now bots, with AI crawlers and agents making up a growing share. Your technical audit needs to account for all of them.

Here are the five layers to add to your existing technical SEO audit.

Layer 1: AI Crawler Access

Your robots.txt was probably written for Googlebot, Bingbot, and maybe a few scrapers. AI crawlers need their own robots.txt rules, and they need to be separate from Googlebot and Bingbot.

What To Check

Review your robots.txt for rules targeting AI-specific user agents: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, AppleBot-Extended, CCBot, and ChatGPT-User. If none of these appear, you’re running on defaults, and those defaults might not reflect what you actually want. Never accept the defaults unless you know they are exactly what you need.

The key is making a conscious decision per crawler rather than blanket allowing or blocking everything. Not all AI crawlers serve the same purpose. AI crawler traffic can be split into three categories: training crawlers that collect data for model training (89.4% of AI crawler traffic according to Cloudflare data), search crawlers that power AI search results (8%), and user-triggered agents like Google-Agent and ChatGPT-User that browse on behalf of a specific human in real time (2.2%). Each category warrants a different robots.txt decision.

Chart showing traffic volume by crawler purpose - Cloudflare Radar Q1 2026 — Cloudflare Radar data showing traffic volume by crawl purpose (Q1 2026); Screenshot by author, April 2026

The crawl-to-referral ratios from Cloudflare’s Radar report can make this an informed decision for you. Anthropic’s ClaudeBot crawls 20.6 thousand pages for every single referral it returns. OpenAI’s ratio is 1,300:1. Meta sends no referrals. Blocking OpenAI’s OAI-SearchBot or PerplexityBot reduces your visibility in ChatGPT Search and Perplexity’s AI answers. Blocking training-focused crawlers like CCBot or Meta’s crawler prevents data extraction from a provider that sends zero traffic back. The crawl-to-referral ratios tell you who is taking without giving.

There is one crawler that requires special attention. Google added Google-Agent to its official list of user-triggered fetchers on March 20, 2026. Google-Agent identifies requests from AI systems running on Google infrastructure that browse websites on behalf of users. Unlike traditional crawlers, Google-Agent ignores robots.txt. Google’s position is that since a human initiated the request, the agent acts as a user proxy rather than an autonomous crawler. Blocking Google-Agent requires server-side authentication, not robots.txt rules. This is both interesting, and important for the future, even if it’s not within the scope of this article.

Official documentation for each crawler:

Layer 2: JavaScript Rendering

Googlebot renders JavaScript using headless Chromium. There is nothing new about that. What is new and different is that virtually every major AI crawler does not render JavaScript.

Crawler	Renders JavaScript
GPTBot (OpenAI)	No
ClaudeBot (Anthropic)	No
PerplexityBot	No
CCBot (Common Crawl)	No
AppleBot	Yes
Googlebot	Yes

AppleBot (which uses a WebKit-based renderer) and Googlebot are the only major crawlers that render JavaScript. Four of the six major web crawlers (GPTBot, ClaudeBot, PerplexityBot, and CCBot) fetch static HTML only, making server-side rendering a requirement for AI search visibility, not an optimization. If your content lives in client-side JavaScript, it is invisible to the crawlers training OpenAI, Anthropic, and Perplexity’s models and powering their AI search products.

What To Check

Run curl -s [URL] on your critical pages and search the output for key content like product names, prices, or service descriptions. If that content isn’t in the curl response, GPTBot, ClaudeBot, and PerplexityBot can’t see it either. Alternatively, use View Source in your browser (not Inspect Element, which shows the rendered DOM after JavaScript execution) and check whether the important information is present in the raw HTML.

CURL fetch of No Hacks homepage — Curl fetch of No Hacks homepage (Image from author, April 2026)

Single-page applications (SPAs) built with React, Vue, or Angular are particularly at risk unless they use server-side rendering (SSR) or static site generation (SSG). A React SPA that renders product descriptions, pricing, or key claims entirely on the client side is sending AI crawlers a blank page with a link to the JavaScript bundle.

The fix isn’t complicated. Server-side rendering (SSR), static site generation (SSG), or pre-rendering solves this for every major framework. Next.js supports SSR and SSG natively for React, Nuxt provides the same for Vue, and Angular Universal handles server rendering for Angular applications. The audit just needs to flag which pages depend on client-side JavaScript for critical content.

Layer 3: Structured Data For AI

Structured data has been part of technical SEO audits for years, but the evaluation criteria need updating. The question is no longer just “does this page have schema markup?” It’s “does this markup help AI systems understand and cite this content?”

What To Check

JSON-LD implementation (preferred over Microdata and RDFa for AI parsing).
Schema types that go beyond the basics: Organization, Article, Product, FAQ, HowTo, Person.
Entity relationships: sameAs, author, publisher connections that link your content to known entities.
Completeness: are all relevant properties populated, or are you just checking a box using skeleton schemas with name and URL?

Why This Matters Now

Microsoft’s Bing principal product manager Fabrice Canel confirmed in March 2025 that schema markup helps LLMs understand content for Copilot. The Google Search team stated in April 2025 that structured data gives an advantage in search results.

No, you can’t win with schema alone. Yes, it can help.

The data density angle matters too. The GEO research paper by Princeton, Georgia Tech, the Allen Institute for AI, and IIT Delhi (presented at ACM KDD 2024, first to publicly use the term “GEO”) found that adding statistics to content improved AI visibility by 41%. Yext’s analysis found that data-rich websites earn 4.3x more AI citations than directory-style listings. Structured data contributes to data density by giving AI systems machine-readable facts rather than requiring them to extract meaning from prose.

An important caveat: No peer-reviewed academic studies exist yet on schema’s impact on AI citation rates specifically. The industry data is promising and consistent, but treat these numbers as indicators rather than guarantees.

W3Techs reports that approximately 53% of the top 10 million websites use JSON-LD as of early 2026. If your website isn’t among them, you’re missing signals that both traditional and AI search systems use to understand your content.

Duane Forrester, who helped build Bing Webmaster Tools and co-launched Schema.org, argues that schema markup is only step one. As AI agents continue moving from simply interpreting pages to making decisions, brands will also need to publish operational truth (pricing, policies, constraints) in machine-verifiable formats with versioning and cryptographic signatures. Publishing machine-verifiable source packs is beyond the scope of a standard audit today, but auditing structured data completeness and accuracy is the foundation verified source packs build on.

Layer 4: Semantic HTML And The Accessibility Tree

The first three layers of the AI-readiness audit cover crawler access (robots.txt), JavaScript rendering, and structured data. The final two address how AI agents actually read your pages and what signals help them discover and evaluate your content.

Most SEOs evaluate HTML for search engine consumption. Agentic browsers like ChatGPT Atlas, Chrome with auto browse, and Perplexity Comet don’t parse pages the way Googlebot does. They read the accessibility tree instead.

The accessibility tree is a parallel representation of your page that browsers generate from your HTML. It strips away visual styling, layout, and decoration, keeping only the semantic structure: headings, links, buttons, form fields, labels, and the relationships between them. Screen readers like VoiceOver and NVDA have used the accessibility tree for decades to make websites usable for people with visual impairments. AI agents now use the same tree to understand and interact with web pages.

And the reason is simple: efficiency. Processing screenshots is both more expensive and slower than working with the accessibility tree.

Accessibility tree shown in Google Chrome — This is what an accessibility tree looks like in Google Chrome (Image from author, April 2026)

This matters because the accessibility tree exposes what your HTML actually communicates, not what your CSS (or JS) makes it look like. A


 styled to look like a button doesn’t appear as a button in the accessibility tree. An image without alt text means nothing. A heading hierarchy that skips from H1 to H4 creates a broken structure that both screen readers and AI agents will struggle to navigate.
Microsoft’s Playwright MCP, the standard tool for connecting AI models to browser automation, uses accessibility snapshots rather than raw HTML or screenshots. Playwright MCP’s browser_snapshot function returns an accessibility tree representation because it’s more compact and semantically meaningful for LLMs. OpenAI’s documentation states that ChatGPT Atlas uses ARIA tags to interpret page structure when browsing websites.
Web accessibility and AI agent compatibility are now the same discipline. Proper heading hierarchy (H1-H6) creates meaningful sections that AI systems use for content extraction. Semantic elements like 

, , 
, and 
 tell machines what role each content block plays. Form labels and descriptive button text make interactive elements understandable to agents that parse the accessibility tree instead of rendering visual design.
What To Check

Heading hierarchy: logical H1-H6 structure that machines can use to understand content relationships.
Semantic elements: nav, main, article, section, aside, header, footer, used appropriately.
Form inputs: every input has a label, every button has descriptive text.
Interactive elements: clickable things use

Source link

Sidenote: The Markdown Shortcut Doesn’t Work

Layer 5: AI Discoverability Signals

The Audit Checklist

From Audit To Action

Why Technical SEO Audit Is Where This Belongs

No Agency Cube

The Technical SEO Audit Needs A New Layer

ByRose Milev

Layer 1: AI Crawler Access

What To Check

Layer 2: JavaScript Rendering

What To Check

Layer 3: Structured Data For AI

What To Check

Why This Matters Now

Layer 4: Semantic HTML And The Accessibility Tree

By Rose Milev

Related Post

Here’s What You Need To Know

Google Begins Rolling Out May 2026 Core Update

LLM Guidance Doesn’t Transfer The Way SEO Guidance Did

Leave a Reply Cancel reply

You missed

Here’s What You Need To Know

Google Begins Rolling Out May 2026 Core Update

LLM Guidance Doesn’t Transfer The Way SEO Guidance Did

How to Stress Test a Staging Environment – Ask An SEO

No Agency Cube