robots.txt Validator & Tester

Check syntax, validate crawler rules, analyze AI bot access and test any URL – free, no account needed

πŸ“„

Drop robots.txt here

or click to select

robots.txt Validator: Complete Rule Analysis

A robots.txt can be syntactically correct and still contain errors. Incorrectly set Disallow rules that block important pages, missing Sitemap entries or AI bots with unintended access are common problems that only a full rule analysis reveals. This validator checks everything – not just syntax.

What is checked:

  • β†’Syntax and structure: Every line is checked for valid directives. Unknown or misspelled directives like Dissalow or user agent are flagged as warnings.
  • β†’User-agent blocks: Complete analysis of all user-agent groups – which bots are allowed, which are blocked, which have partial restrictions.
  • β†’AI bot control: Specific evaluation for GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Amazonbot, Applebot-Extended and Bytespider – see at a glance which AI crawlers have access.
  • β†’Sitemap declaration: Missing Sitemap entries are detected. Google recommends declaring all sitemaps in robots.txt for more efficient crawling.
  • β†’Critical rules: Disallow: / for Googlebot or User-agent: * is flagged as a critical error that blocks your entire site from search engines.
  • β†’Duplicate blocks: Multiple User-agent blocks for the same bot are detected – only the first block is evaluated by some crawlers.
  • β†’URL tester: After validation, any URL can be tested against the loaded rules – separately for each user-agent including all major AI crawlers.

How to use this robots.txt checker

Enter your domain URL (e.g. https://example.com) and the tool fetches and validates your robots.txt automatically. Alternatively, paste the content directly into the text field or upload the file. The validation runs entirely in the browser – no data is sent to a server.

Correct robots.txt syntax

A valid robots.txt uses plain text with one directive per line. Each block starts with User-agent: followed by Disallow: and/or Allow: rules. Directives are case-insensitive for names but values like paths are case-sensitive.

# Allow all crawlers User-agent: * Disallow: Sitemap: https://example.com/sitemap.xml # Block ChatGPT training crawler User-agent: GPTBot Disallow: / # Block admin area from all bots User-agent: * Disallow: /admin/ Disallow: /private/

robots.txt and AI crawlers

Since 2023, major AI platforms have introduced dedicated crawlers for training and search. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot and Google-Extended all respect robots.txt voluntarily. Controlling which AI crawlers can access your content is now an important decision for any website owner – with direct implications for AI visibility, training data participation and referral traffic.

PerplexityBot is particularly valuable to allow: Perplexity cites sources with clickable links in every answer, making it the AI crawler most likely to send direct traffic to your site. Google-Extended controls whether your content appears in Google AI Overviews. GPTBot and ClaudeBot affect your presence in ChatGPT and Claude answers but rarely link back directly.

Frequently Asked Questions

What is a robots.txt file?+

A robots.txt is a plain text file placed in the root directory of a website (e.g. example.com/robots.txt). It tells search engine crawlers which areas of the site may or may not be crawled. It uses the Robots Exclusion Protocol with directives like User-agent, Disallow, Allow and Sitemap.

How do I check if my robots.txt is valid?+

Enter your domain URL above and click Check. The tool fetches your robots.txt automatically and analyzes syntax, crawler rules and AI bot access. You can also paste content directly or upload the file. Results appear instantly with a quality score, errors, warnings and an AI bot overview table.

Can AI bots like ChatGPT be blocked via robots.txt?+

Yes. OpenAI's GPTBot, Anthropic's ClaudeBot, Perplexity's PerplexityBot and Google's Google-Extended all respect robots.txt. Add a block like User-agent: GPTBot followed by Disallow: / to prevent ChatGPT from crawling your site. These bots respect this voluntarily – it does not replace legal protection.

What does Disallow: / mean?+

Disallow: / blocks a crawler completely from the entire website. If set for User-agent: * or specifically for Googlebot, Google cannot crawl or index the website. This is a critical SEO error requiring immediate correction. The validator flags this as a critical error.

What is the difference between Disallow and noindex?+

Disallow prevents crawling but does not guarantee non-indexing. Google can still index a URL if other pages link to it, even without crawling it. The noindex meta tag prevents indexing but requires the page to be crawled first. For reliable non-indexing, combine both: allow crawling but set noindex on the page.

Does robots.txt affect Google rankings?+

Indirectly yes. Blocking important pages prevents Google from indexing them, removing them from search results entirely. Blocking Google-Extended has no effect on classic Search rankings but removes your content from Google AI Overviews. A missing Sitemap entry can reduce crawl efficiency for large sites.

What is the correct robots.txt syntax?+

Plain text, one directive per line. Blocks start with User-agent: followed by Disallow: and optionally Allow: rules. Paths are case-sensitive. An empty Disallow: means allow all. Use Sitemap: to declare your sitemap URL. Comments start with #.

Where does robots.txt need to be placed?+

The file must be in the root directory of the domain, accessible at https://yourdomain.com/robots.txt. It cannot be placed in a subdirectory. It must be served with a 200 HTTP status and text/plain content type. Subdomains require their own separate robots.txt.

Should I block Bytespider in robots.txt?+

For most websites, blocking Bytespider (ByteDance/TikTok's AI crawler) is a reasonable default. It has the lowest transparency of major AI crawlers, no published IP ranges, no consumer-facing AI product that cites sources with links, and has been reported to crawl aggressively. Add User-agent: Bytespider and Disallow: / to block it.

Related Tools

robots.txt Generator
Create robots.txt files easily
Sitemap Validator
Validate sitemap structure and errors
Sitemap Generator
Generate XML sitemaps automatically

Related Guides

robots.txt for AI Crawlers
The complete step-by-step guide
AI Crawler Comparison 2026
GPTBot, ClaudeBot, PerplexityBot, Google-Extended compared