Tools/SEO & Web/Robots.txt Tester

Robots.txt Tester

Check if a URL path is allowed or blocked by a site's robots.txt rules.

Website URL

Test Path

Related Tools

Sitemap Validator

Validate XML sitemaps and inspect URLs

Page Speed Analyzer

Analyze page load performance and get recommendations

Meta Tag Generator

Generate SEO-friendly meta tags for your pages

robots.txt directives reference

# Apply rules to all crawlers
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Disallow: /*.pdf$          # glob — no PDFs
Disallow: /search?         # query strings

# AI crawler opt-in/out
User-agent: GPTBot
Allow: /                    # OpenAI — allow

User-agent: ClaudeBot
Allow: /                    # Anthropic — allow

User-agent: Google-Extended
Disallow: /                 # Google AI training — opt OUT

# Specific bot rules
User-agent: MJ12bot
Crawl-delay: 10             # SEO spam crawler — rate-limit

# Tell crawlers where the sitemap is
Sitemap: https://example.com/sitemap.xml

Common robots.txt gotchas

robots.txt only prevents crawling, not indexing. A disallowed URL can still appear in search results if other sites link to it. To prevent indexing, use <meta name="robots" content="noindex"> on the page itself or an X-Robots-Tag: noindex header.
robots.txt must be at the root. example.com/robots.txt, never example.com/site/robots.txt.
Case-sensitive paths. Disallow: /Admin/ won't block /admin/. Lowercase URL scheme helps here.
Don't block your own CSS and JS. Google needs to render the page to rank it properly; blocking CSS/JS tanks rankings.
Crawl-delay is honoured by Bing, Yandex, Seznam — ignored by Google. For Google use Search Console's crawl rate settings (and only if you actually have crawl budget issues).

Frequently Asked Questions

How do I test a robots.txt rule against a URL?

Paste your robots.txt content, enter a URL path and user-agent (e.g. Googlebot, Bingbot, GPTBot). The tester simulates exactly how a crawler evaluates the rules, showing which directive matched and whether the path is allowed or blocked.

How do I allow AI bots (GPTBot, ClaudeBot, PerplexityBot)?

Add explicit Allow blocks per bot: ``` User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: / ``` Our robots.txt tester validates that each bot sees `Allow: /` and isn't accidentally blocked by wildcards higher up.

What's the difference between Disallow, Allow and Crawl-delay?

Disallow = don't crawl these paths. Allow = explicit permission (useful as exception inside a Disallow block). Crawl-delay = seconds to wait between requests (honoured by Bing, Yandex; Google ignores). Sitemap: directive points to XML sitemap.

Does robots.txt block indexing or just crawling?

Only crawling. A URL can still appear in search results if other sites link to it (Google knows it exists). To prevent indexing, use `<meta name="robots" content="noindex">` or `X-Robots-Tag: noindex` header — robots.txt alone isn't enough to de-index.

How do I block a specific bot or scraper?

Add `User-agent: BadBot` then `Disallow: /`. Rate-limit polite bots with `Crawl-delay: 10`. For truly malicious scrapers that ignore robots.txt, use server-side rules (Cloudflare, nginx `deny`, WAF) — robots.txt is advisory, not enforced.

Related Developer Tools

Legal

Company

About Contact

Get in touch

email contact@webority.com

Designed and Developed by Webority Technologies

Copied to clipboard