AI Bot Access Tester — Check How 21 AI Crawlers See Your Site
Paste up to 20 URLs to test how ChatGPT, Claude, Perplexity, Gemini, and 16 more AI crawlers interact with your site — checking robots.txt rules and live HTTP access for each page.
What This Tool Tests
AI search is no longer a future consideration — it's the present. ChatGPT Browse, Perplexity, Google Gemini, and Claude are all actively crawling the web, and whether your site appears in their answers depends on three layers of access control that traditional SEO tools don't check.
- Layer 1 — robots.txt directives
-
The robots.txt file at yourdomain.com/robots.txt is the first thing
AI crawlers check before visiting any page. A single
Disallow: /rule underUser-agent: *blocks every bot that doesn't have an explicit exception — which means sites that haven't updated their robots.txt since 2020 may be blocking every AI crawler without knowing it. - Layer 2 — Actual HTTP access
- Some CDNs (Cloudflare, Fastly, Akamai) and WAF firewalls block specific bot user-agents at the network level, returning a 403 Forbidden regardless of what robots.txt says. This tool makes a live HEAD request as each bot so you can see the real response.
- Layer 3 — X-Robots-Tag HTTP header
-
A page can allow crawling in robots.txt but then send an
X-Robots-Tag: noindexheader in the HTTP response. AI bots vary in how they handle this directive.
Understanding Your Results
Training Bots vs. Retrieval Bots — A Critical Distinction
The most important concept for anyone managing AI bot access is the difference between training crawlers and retrieval crawlers. These are separate bots from the same companies, controlled by separate robots.txt tokens, and blocking one does not necessarily block the other.
GPTBot, ClaudeBot, Google-Extended, CCBot
Crawl content to train future model versions. Blocking them prevents your content entering training datasets. Impact is gradual — affects future model behaviour.
ChatGPT-User, Claude-User, Perplexity-User, DuckAssistBot
Fetch content in real-time when a user asks an AI a question. Blocking these has immediate impact — AI cannot cite your pages in live answers today.
OAI-SearchBot, PerplexityBot, bingbot, Googlebot, Applebot
Build the search index the AI draws from. Blocking affects whether you appear in AI-powered search results.
How to Update Your robots.txt for AI Bots
The robots.txt file is the primary control layer for AI bot access. Here is the canonical structure for a site that wants to allow AI retrieval (real-time citations) while blocking AI training data collection. Use the snippet generator above to build your own customised version.
# Allow AI search and retrieval bots (real-time answers) User-agent: ChatGPT-User User-agent: OAI-SearchBot User-agent: Claude-User User-agent: PerplexityBot User-agent: Perplexity-User User-agent: bingbot User-agent: DuckAssistBot Allow: / # Block AI training data collection User-agent: GPTBot User-agent: ClaudeBot User-agent: anthropic-ai User-agent: Google-Extended User-agent: CCBot User-agent: Bytespider User-agent: meta-externalagent User-agent: Applebot-Extended Disallow: /
Also see: Test your full robots.txt for syntax errors and make sure AI bots can find all your pages .
