Robots.txt Validator & Bulk URL Tester

Check If Pages Are Blocked

Fetch any site's robots.txt, then test up to 100 URLs against it for Googlebot, Bingbot, GPTBot and more. See the exact rule — and the line number — that blocks each page, or paste a custom robots.txt to validate changes before you publish them.

1. Choose a robots.txt

We fetch https://domain/robots.txt

2. Check URLs

0 URLs entered

How robots.txt Controls Crawling

A robots.txt file lives at the root of every domain (yourdomain.com/robots.txt) and is the first thing a crawler reads before requesting any page. It uses a simple set of directives — User-agent, Allow, and Disallow — to tell each bot which paths it may and may not crawl. A single misplaced Disallow: / can hide an entire site from search engines, so verifying the file against your real URLs matters far more than checking its syntax in isolation.

This tool does exactly that. It fetches the live robots.txt for each domain you test, parses it the same way Googlebot does, and reports — per URL and per user agent — whether the page is allowed, and if not, the precise rule and line number responsible.

How Conflicting Rules Are Resolved

When more than one rule could apply to a URL, search engines use the most specific match — the rule with the longest path pattern wins, regardless of whether it is an Allow or a Disallow. This is why Allow: /blog/ can override a broader Disallow: / for pages under /blog/. The tool surfaces the actual winning rule for each URL so you never have to trace the precedence by hand.

✅ Allowed

No matching Disallow rule, or a more specific Allow rule overrides a broader Disallow. The crawler may fetch the page.

❌ Blocked

A Disallow rule matches and is not overridden. Click any blocked cell to see the exact rule and jump to its line in the robots.txt viewer.

Testing Multiple User Agents

robots.txt rules can target specific crawlers. A site might welcome Googlebot while blocking AI training bots like GPTBot and Google-Extended — or accidentally block Bingbot with an overly broad rule. Select any combination of user agents and the tool evaluates each URL against the rule block that applies to that specific bot, including the catch-all User-agent: * fallback.

  • Googlebot — Google Search crawler
  • Bingbot — Microsoft Bing crawler
  • GPTBot — OpenAI training crawler
  • ClaudeBot — Anthropic training crawler
  • Google-Extended — Google AI (Gemini) training
  • All bots (*) — the catch-all rule block

Validate Changes Before You Publish

Editing robots.txt on a live site is risky — one wrong rule can deindex thousands of pages. Switch to Custom robots.txt mode, paste your proposed file (or fetch the live one and edit it), and test it against the URLs that matter most. Confirm that key pages stay allowed and the paths you want hidden are actually blocked, all before the change ever reaches production.

Related Tools

Frequently Asked Questions

Why is a page showing as blocked even though I can see it in Google?
robots.txt controls whether Google can crawl a page, not whether it appears in search results. Google may already have indexed the page from a previous crawl before the block was added, or it may have learned about the URL from links on other sites and shows it without crawling the content (a URL-only result with no snippet). Blocking in robots.txt does not remove a page from the index — for that you need a noindex directive.
What is the difference between User-agent: * and User-agent: Googlebot?
User-agent: * applies to all crawlers that do not have their own named rule block. User-agent: Googlebot applies specifically to Googlebot and overrides the * rules for it. If you have Disallow: / under User-agent: * but Allow: / under User-agent: Googlebot, Google will crawl the site while all other bots are blocked.
Does blocking GPTBot in robots.txt also block ChatGPT from reading my pages?
No — GPTBot and ChatGPT-User are different user agents from OpenAI with different purposes. GPTBot is used for training-data collection; blocking it keeps your content out of future model training datasets. ChatGPT-User is the agent used when a ChatGPT user triggers a web search or browsing action and OpenAI fetches the page in real time. Blocking GPTBot does not block ChatGPT-User, and vice versa. This tool lets you check both simultaneously so you can verify each is configured correctly.
I see a Crawl-delay directive — does this tool check that?
This tool reports a Crawl-delay directive if present, but does not enforce it during checks. Crawl-delay is not part of the official robots.txt standard and is ignored by Google. Bing and some other crawlers do honour it. It is shown for informational purposes only.
How does the custom robots.txt mode work?
Switch to Custom robots.txt, paste (or fetch and edit) a robots.txt, then enter the URLs you want to verify. The custom file is tested against every URL regardless of its domain, so you can confirm that the pages you need indexed stay allowed — and the pages you want blocked are actually blocked — before publishing any change to your live site.

Free account unlocks more

Now (no account)

  • AI calls: 3 per session
  • Tool result history: Not saved
  • Referral bonus AI calls:
  • Early access to new tools:

Free account

  • AI calls: 10 per day
  • Tool result history: Last 30 days
  • Referral bonus AI calls: +50 on referral
  • Early access to new tools: Newsletter updates
Create free account →