What is GPTBot?
GPTBot is an automated program (crawler or bot) operated by OpenAI to visit web pages, read their content and use this information to train and update ChatGPT. OpenAI officially introduced GPTBot in August 2023, providing technical details so website operators can control the crawler specifically.
Unlike a human visitor, GPTBot does not render graphics, does not execute JavaScript and does not interact with forms or buttons. It exclusively reads the HTML source code of a page — just like all other web crawlers.
Important to understand: GPTBot collects data for two purposes: training future models and updating the knowledge of already trained models. Both processes influence whether and how ChatGPT mentions your website in answers.
Technical details
GPTBot identifies itself to web servers via its user agent string, which contains the identifier "GPTBot" and a version number. The IP addresses from which GPTBot operates come from the OpenAI network and can be verified via OpenAI's publicly available IP address list.
What GPTBot reads and ignores
What GPTBot reads
- HTML text content — all visible text on a page
- Meta tags — title, description, Open Graph tags
- Structured data — Schema.org JSON-LD in the head or body
- Alt texts — image descriptions in the alt attribute
- Heading structure — H1 through H6 headings
- Internal and external links — for crawling decisions
What GPTBot does not read
- JavaScript-rendered content — what is only visible after JS execution, GPTBot cannot see
- Images and videos — only the alt text is read, not the visual content
- PDF content — unless rendered as HTML
- Login-protected areas — GPTBot does not log in
- Content behind paywalls — unless present in the HTML source
Important for JavaScript-heavy websites: Single-page applications (SPAs) that rely heavily on JavaScript are often only partially or not at all readable for GPTBot. If your most important content only appears in the DOM after JavaScript execution, GPTBot sees an empty or content-poor page.
GPTBot vs. Googlebot
- Timeout tolerance: Googlebot waits significantly longer for server responses than GPTBot. Pages with a TTFB over 2–3 seconds are more frequently abandoned by GPTBot.
- JavaScript: Googlebot can render JavaScript (with delay). GPTBot does not render JavaScript — it only reads the initial HTML source.
- Crawl budget: Googlebot has a significantly higher crawl budget and visits pages much more frequently.
- Purpose: Googlebot crawls for search results, GPTBot for AI training and knowledge updates.
- Controllability: Both respect robots.txt, but Googlebot offers much more transparency via Search Console.
Controlling GPTBot with robots.txt
Allow GPTBot completely (default)
Block GPTBot for specific areas
Block GPTBot completely
Critical error: "Disallow: /" for User-agent * blocks not only spam bots but also GPTBot, ClaudeBot, PerplexityBot and even Googlebot. Your website will not appear in either Google or ChatGPT answers.
When should you block GPTBot?
Reasons to block
- Copyrighted content — if you do not want your texts used for AI training
- Paid content / paywall — content that should only be accessible to paying customers
- Personal or sensitive data — pages with user data or confidential information
Reasons to allow
- Visibility in ChatGPT — allowing GPTBot increases the chance of being cited in ChatGPT answers
- Public information — content that is freely accessible anyway
- Marketing and brand building — presence in AI answers as a marketing channel
Optimising your website for GPTBot
- Check robots.txt — GPTBot must have access (no "Disallow: /" for GPTBot or User-agent *)
- Server response time (TTFB) under 800ms — GPTBot aborts earlier than Googlebot on slow servers
- Important content in the HTML source — not only loaded via JavaScript
- Implement Schema.org JSON-LD — helps GPTBot understand the context
- Alt texts for all relevant images
- Clear heading structure (H1, H2, H3)
- Fill in meta tags completely — title and description
- Link sitemap in robots.txt
- Structure internal linking — make important pages easily reachable
Other AI crawlers compared
- ClaudeBot (Anthropic / Claude) — User agent: "ClaudeBot". Works on similar principles to GPTBot. Respects robots.txt.
- PerplexityBot (Perplexity AI) — User agent: "PerplexityBot". Crawls for fact-based search with source references.
- Google-Extended (Google / Gemini) — Separate crawler for Google Gemini and AI Overviews. Can be controlled independently of Googlebot.
- Amazonbot (Amazon / Alexa) — Crawler for Amazon AI products.
- Applebot-Extended (Apple / Siri) — Extended Apple crawler for AI features.
Can GPTBot crawl your website?
Check for free now whether GPTBot, ClaudeBot and PerplexityBot have access to your website — and whether your technical foundation is optimised for AI crawlers.
Test for free now →