Check syntax, validate crawler rules, analyze AI bot access and test any URL β free, no account needed
Drop robots.txt here
or click to selectGet notified when new tools launch.
A robots.txt can be syntactically correct and still contain errors. Incorrectly set Disallow rules that block important pages, missing Sitemap entries or AI bots with unintended access are common problems that only a full rule analysis reveals. This validator checks everything β not just syntax.
Dissalow or user agent are flagged as warnings.Disallow: / for Googlebot or User-agent: * is flagged as a critical error that blocks your entire site from search engines.Enter your domain URL (e.g. https://example.com) and the tool fetches and validates your robots.txt automatically. Alternatively, paste the content directly into the text field or upload the file. The validation runs entirely in the browser β no data is sent to a server.
A valid robots.txt uses plain text with one directive per line. Each block starts with User-agent: followed by Disallow: and/or Allow: rules. Directives are case-insensitive for names but values like paths are case-sensitive.
Since 2023, major AI platforms have introduced dedicated crawlers for training and search. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot and Google-Extended all respect robots.txt voluntarily. Controlling which AI crawlers can access your content is now an important decision for any website owner β with direct implications for AI visibility, training data participation and referral traffic.
PerplexityBot is particularly valuable to allow: Perplexity cites sources with clickable links in every answer, making it the AI crawler most likely to send direct traffic to your site. Google-Extended controls whether your content appears in Google AI Overviews. GPTBot and ClaudeBot affect your presence in ChatGPT and Claude answers but rarely link back directly.
A robots.txt is a plain text file placed in the root directory of a website (e.g. example.com/robots.txt). It tells search engine crawlers which areas of the site may or may not be crawled. It uses the Robots Exclusion Protocol with directives like User-agent, Disallow, Allow and Sitemap.
Enter your domain URL above and click Check. The tool fetches your robots.txt automatically and analyzes syntax, crawler rules and AI bot access. You can also paste content directly or upload the file. Results appear instantly with a quality score, errors, warnings and an AI bot overview table.
Yes. OpenAI's GPTBot, Anthropic's ClaudeBot, Perplexity's PerplexityBot and Google's Google-Extended all respect robots.txt. Add a block like User-agent: GPTBot followed by Disallow: / to prevent ChatGPT from crawling your site. These bots respect this voluntarily β it does not replace legal protection.
Disallow: / blocks a crawler completely from the entire website. If set for User-agent: * or specifically for Googlebot, Google cannot crawl or index the website. This is a critical SEO error requiring immediate correction. The validator flags this as a critical error.
Disallow prevents crawling but does not guarantee non-indexing. Google can still index a URL if other pages link to it, even without crawling it. The noindex meta tag prevents indexing but requires the page to be crawled first. For reliable non-indexing, combine both: allow crawling but set noindex on the page.
Indirectly yes. Blocking important pages prevents Google from indexing them, removing them from search results entirely. Blocking Google-Extended has no effect on classic Search rankings but removes your content from Google AI Overviews. A missing Sitemap entry can reduce crawl efficiency for large sites.
Plain text, one directive per line. Blocks start with User-agent: followed by Disallow: and optionally Allow: rules. Paths are case-sensitive. An empty Disallow: means allow all. Use Sitemap: to declare your sitemap URL. Comments start with #.
The file must be in the root directory of the domain, accessible at https://yourdomain.com/robots.txt. It cannot be placed in a subdirectory. It must be served with a 200 HTTP status and text/plain content type. Subdomains require their own separate robots.txt.
For most websites, blocking Bytespider (ByteDance/TikTok's AI crawler) is a reasonable default. It has the lowest transparency of major AI crawlers, no published IP ranges, no consumer-facing AI product that cites sources with links, and has been reported to crawl aggressively. Add User-agent: Bytespider and Disallow: / to block it.