SERP.tools

AI Bot Access Tester — Check How 21 AI Crawlers See Your Site

Paste up to 20 URLs to test how ChatGPT, Claude, Perplexity, Gemini, and 16 more AI crawlers interact with your site — checking robots.txt rules and live HTTP access for each page.

Paste URLs above · 21 AI bots will be tested

What This Tool Tests

AI search is no longer a future consideration — it's the present. ChatGPT Browse, Perplexity, Google Gemini, and Claude are all actively crawling the web, and whether your site appears in their answers depends on three layers of access control that traditional SEO tools don't check.

Layer 1 — robots.txt directives
The robots.txt file at yourdomain.com/robots.txt is the first thing AI crawlers check before visiting any page. A single Disallow: / rule under User-agent: * blocks every bot that doesn't have an explicit exception — which means sites that haven't updated their robots.txt since 2020 may be blocking every AI crawler without knowing it.
Layer 2 — Actual HTTP access
Some CDNs (Cloudflare, Fastly, Akamai) and WAF firewalls block specific bot user-agents at the network level, returning a 403 Forbidden regardless of what robots.txt says. This tool makes a live HEAD request as each bot so you can see the real response.
Layer 3 — X-Robots-Tag HTTP header
A page can allow crawling in robots.txt but then send an X-Robots-Tag: noindex header in the HTTP response. AI bots vary in how they handle this directive.

Understanding Your Results

Fully Allowed. robots.txt permits the bot AND the live HTTP request returns a 200 OK. The AI crawler can access and index your content.
Mixed Signals. The robots.txt allows the bot, but the HTTP request returns a non-200 status (typically 403 Forbidden). This usually indicates CDN or WAF-level blocking — fix this in your CDN settings, not in robots.txt.
Blocked. Either robots.txt explicitly disallows the bot, or the HTTP request fails. Blocking training bots (GPTBot, ClaudeBot) affects future model training; blocking retrieval bots (ChatGPT-User, Claude-User) removes you from live AI answers today.
Timeout. The bot request timed out after 8 seconds. This may indicate rate limiting triggered by specific user-agents, or simply a slow server.

Training Bots vs. Retrieval Bots — A Critical Distinction

The most important concept for anyone managing AI bot access is the difference between training crawlers and retrieval crawlers. These are separate bots from the same companies, controlled by separate robots.txt tokens, and blocking one does not necessarily block the other.

Training

GPTBot, ClaudeBot, Google-Extended, CCBot

Crawl content to train future model versions. Blocking them prevents your content entering training datasets. Impact is gradual — affects future model behaviour.

Retrieval

ChatGPT-User, Claude-User, Perplexity-User, DuckAssistBot

Fetch content in real-time when a user asks an AI a question. Blocking these has immediate impact — AI cannot cite your pages in live answers today.

Search / Index

OAI-SearchBot, PerplexityBot, bingbot, Googlebot, Applebot

Build the search index the AI draws from. Blocking affects whether you appear in AI-powered search results.

How to Update Your robots.txt for AI Bots

The robots.txt file is the primary control layer for AI bot access. Here is the canonical structure for a site that wants to allow AI retrieval (real-time citations) while blocking AI training data collection. Use the snippet generator above to build your own customised version.

# Allow AI search and retrieval bots (real-time answers)
User-agent: ChatGPT-User
User-agent: OAI-SearchBot
User-agent: Claude-User
User-agent: PerplexityBot
User-agent: Perplexity-User
User-agent: bingbot
User-agent: DuckAssistBot
Allow: /

# Block AI training data collection
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: Google-Extended
User-agent: CCBot
User-agent: Bytespider
User-agent: meta-externalagent
User-agent: Applebot-Extended
Disallow: /

Also see: Test your full robots.txt for syntax errors and make sure AI bots can find all your pages .

Frequently Asked Questions

A
This is a CDN or Web Application Firewall (WAF) blocking the bot at the network layer, regardless of your robots.txt settings. It is very common with Cloudflare's "Bot Fight Mode" and similar features. The CDN intercepts the request before it ever reaches your web server, returning a 403 or connection reset. Your robots.txt is irrelevant in this scenario — you need to configure an allow rule in your CDN/WAF for the specific bot user-agents you want to permit.

A
It sends HTTP requests with the exact user-agent strings used by each AI bot. It does not, however, originate from the official IP ranges of those bots — requests come from SERP.tools' server. Some CDN-level bot blocking is IP-based rather than user-agent-based, so a request from our server IP might behave differently from a real GPTBot request from OpenAI's IP range. The robots.txt check is fully accurate; the HTTP access check is a proxy that catches most cases but not all.

A
That depends on your goals. If you're concerned about your content being used as AI training data without compensation, blocking training bots (GPTBot, ClaudeBot, Google-Extended, CCBot) via robots.txt is a reasonable choice. However, blocking retrieval bots (ChatGPT-User, Claude-User, Perplexity-User) means AI systems cannot cite your content in live answers to users — this directly reduces your visibility in AI-powered search. Most publishers benefit from allowing retrieval bots while making an informed choice about training bots.

A
GPTBot is OpenAI's training crawler — it crawls the web to collect content for future GPT model training. Blocking it prevents your content from entering future training datasets. ChatGPT-User is a separate user-agent that fires when a ChatGPT user triggers a web search or browsing session and ChatGPT needs to fetch your page in real-time. Blocking ChatGPT-User means ChatGPT cannot read your page during live sessions and therefore cannot cite it in answers. Both are controlled separately in robots.txt.

A
Mixed Signals (⚠) means robots.txt permits the bot but the live HTTP request returned a non-200 status — typically a 403 Forbidden. This usually indicates CDN or WAF-level blocking. Your robots.txt says "come in" but the network firewall is slamming the door. The fix is to add an allow rule for those specific user-agents in your CDN settings, not to change robots.txt.

A
After any deployment or robots.txt change, and after any CDN configuration update. AI crawler policies and user-agents are also evolving rapidly — new bots appear regularly as AI companies update their products. Bookmark this tool and re-run quarterly at minimum.

A
The snippet generator (available after running a check) lets you build a ready-to-paste robots.txt block for all 21 AI bots. Toggle each bot between Allow and Disallow, then copy the result. It's the fastest way to produce a correct, well-commented robots.txt section for AI crawlers without manually looking up every user-agent token.