robots.txt Validator & Tester

Check syntax, analyze rules and test URL accessibility

πŸ“„

Drop robots.txt here

or click to select

More than a syntax check β€” complete rule analysis

A robots.txt can be syntactically correct and still contain errors β€” incorrectly set Disallow rules that block important pages, missing Sitemap entries or AI bots that have unintended access. This validator analyzes the complete rule structure and shows concrete recommendations.

What is checked:

  • β†’Syntax & Structure β€” Each line is checked for valid directives. Unknown or misspelled directives are flagged as warnings.
  • β†’User-agent blocks β€” Complete analysis of all user-agent groups: which bots are allowed, which are blocked, which are partially restricted.
  • β†’AI Bot control β€” Specific evaluation for GPTBot, ClaudeBot, PerplexityBot and Google-Extended β€” see at a glance which AI crawlers have access.
  • β†’Sitemap entries β€” Missing Sitemap declaration is detected and shown as a recommendation.
  • β†’Critical rules β€” Disallow: / for Googlebot or other important crawlers is flagged as a critical error.
  • β†’URL Tester β€” After validation, any URL can be tested against the loaded rules β€” separately for each user-agent.

Frequently Asked Questions

What is a robots.txt file?+

A robots.txt is a text file in the root directory of a website (e.g. example.com/robots.txt). It tells search engine crawlers which areas of the website may or may not be crawled and indexed. The file uses the Robots Exclusion Protocol with directives like User-agent, Disallow, Allow and Sitemap.

Can AI bots like ChatGPT be blocked via robots.txt?+

Yes. OpenAI's GPTBot, Anthropic's ClaudeBot, Perplexity's PerplexityBot and Google's Google-Extended can all be blocked via robots.txt. Important: These bots respect robots.txt voluntarily β€” it does not replace legal protection.

What does Disallow: / mean?+

Disallow: / blocks a crawler completely from the entire website. If set for User-agent: * or specifically for Googlebot, Google cannot crawl and index the website β€” a critical SEO error requiring immediate attention.

What is the difference between Disallow and noindex?+

Disallow prevents crawling but does not necessarily prevent indexing. The noindex meta tag requires the page to be crawled first. For reliable non-indexing, combine both: allow crawling but set noindex.

Related Tools

robots.txt Generator
Create robots.txt files easily
Sitemap Validator
Validate sitemap structure & errors
Sitemap Generator
Generate XML sitemaps automatically