Website URL
Test Path
Results
Test path:
Rules
| User-Agent | Directive | Path |
|---|
Sitemaps
robots.txt directives reference
# Apply rules to all crawlers
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Disallow: /*.pdf$ # glob — no PDFs
Disallow: /search? # query strings
# AI crawler opt-in/out
User-agent: GPTBot
Allow: / # OpenAI — allow
User-agent: ClaudeBot
Allow: / # Anthropic — allow
User-agent: Google-Extended
Disallow: / # Google AI training — opt OUT
# Specific bot rules
User-agent: MJ12bot
Crawl-delay: 10 # SEO spam crawler — rate-limit
# Tell crawlers where the sitemap is
Sitemap: https://example.com/sitemap.xml
Common robots.txt gotchas
- robots.txt only prevents crawling, not indexing. A disallowed URL can still appear in search results if other sites link to it. To prevent indexing, use
<meta name="robots" content="noindex">on the page itself or anX-Robots-Tag: noindexheader. - robots.txt must be at the root.
example.com/robots.txt, neverexample.com/site/robots.txt. - Case-sensitive paths.
Disallow: /Admin/won't block/admin/. Lowercase URL scheme helps here. - Don't block your own CSS and JS. Google needs to render the page to rank it properly; blocking CSS/JS tanks rankings.
- Crawl-delay is honoured by Bing, Yandex, Seznam — ignored by Google. For Google use Search Console's crawl rate settings (and only if you actually have crawl budget issues).
Frequently Asked Questions
How do I test a robots.txt rule against a URL?
Paste your robots.txt content, enter a URL path and user-agent (e.g. Googlebot, Bingbot, GPTBot). The tester simulates exactly how a crawler evaluates the rules, showing which directive matched and whether the path is allowed or blocked.
How do I allow AI bots (GPTBot, ClaudeBot, PerplexityBot)?
Add explicit Allow blocks per bot: ``` User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: / ``` Our robots.txt tester validates that each bot sees `Allow: /` and isn't accidentally blocked by wildcards higher up.
What's the difference between Disallow, Allow and Crawl-delay?
Disallow = don't crawl these paths. Allow = explicit permission (useful as exception inside a Disallow block). Crawl-delay = seconds to wait between requests (honoured by Bing, Yandex; Google ignores). Sitemap: directive points to XML sitemap.
Does robots.txt block indexing or just crawling?
Only crawling. A URL can still appear in search results if other sites link to it (Google knows it exists). To prevent indexing, use `<meta name="robots" content="noindex">` or `X-Robots-Tag: noindex` header — robots.txt alone isn't enough to de-index.
How do I block a specific bot or scraper?
Add `User-agent: BadBot` then `Disallow: /`. Rate-limit polite bots with `Crawl-delay: 10`. For truly malicious scrapers that ignore robots.txt, use server-side rules (Cloudflare, nginx `deny`, WAF) — robots.txt is advisory, not enforced.
Copyright © 2026 BuildStudio. All rights reserved.
Designed and Developed by Webority Technologies