Tools/SEO & Web/Robots.txt Tester

Robots.txt Tester

Check if a URL path is allowed or blocked by a site's robots.txt rules.

robots.txt directives reference

# Apply rules to all crawlers
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Disallow: /*.pdf$          # glob — no PDFs
Disallow: /search?         # query strings

# AI crawler opt-in/out
User-agent: GPTBot
Allow: /                    # OpenAI — allow

User-agent: ClaudeBot
Allow: /                    # Anthropic — allow

User-agent: Google-Extended
Disallow: /                 # Google AI training — opt OUT

# Specific bot rules
User-agent: MJ12bot
Crawl-delay: 10             # SEO spam crawler — rate-limit

# Tell crawlers where the sitemap is
Sitemap: https://example.com/sitemap.xml

Common robots.txt gotchas

  • robots.txt only prevents crawling, not indexing. A disallowed URL can still appear in search results if other sites link to it. To prevent indexing, use <meta name="robots" content="noindex"> on the page itself or an X-Robots-Tag: noindex header.
  • robots.txt must be at the root. example.com/robots.txt, never example.com/site/robots.txt.
  • Case-sensitive paths. Disallow: /Admin/ won't block /admin/. Lowercase URL scheme helps here.
  • Don't block your own CSS and JS. Google needs to render the page to rank it properly; blocking CSS/JS tanks rankings.
  • Crawl-delay is honoured by Bing, Yandex, Seznam — ignored by Google. For Google use Search Console's crawl rate settings (and only if you actually have crawl budget issues).

Frequently Asked Questions

How do I test a robots.txt rule against a URL?

Paste your robots.txt content, enter a URL path and user-agent (e.g. Googlebot, Bingbot, GPTBot). The tester simulates exactly how a crawler evaluates the rules, showing which directive matched and whether the path is allowed or blocked.

How do I allow AI bots (GPTBot, ClaudeBot, PerplexityBot)?

Add explicit Allow blocks per bot: ``` User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: / ``` Our robots.txt tester validates that each bot sees `Allow: /` and isn't accidentally blocked by wildcards higher up.

What's the difference between Disallow, Allow and Crawl-delay?

Disallow = don't crawl these paths. Allow = explicit permission (useful as exception inside a Disallow block). Crawl-delay = seconds to wait between requests (honoured by Bing, Yandex; Google ignores). Sitemap: directive points to XML sitemap.

Does robots.txt block indexing or just crawling?

Only crawling. A URL can still appear in search results if other sites link to it (Google knows it exists). To prevent indexing, use `<meta name="robots" content="noindex">` or `X-Robots-Tag: noindex` header — robots.txt alone isn't enough to de-index.

How do I block a specific bot or scraper?

Add `User-agent: BadBot` then `Disallow: /`. Rate-limit polite bots with `Crawl-delay: 10`. For truly malicious scrapers that ignore robots.txt, use server-side rules (Cloudflare, nginx `deny`, WAF) — robots.txt is advisory, not enforced.

Copyright © 2026 BuildStudio. All rights reserved.

Designed and Developed by Webority Technologies

Copied to clipboard