What is the user agent for Amazonbot?

Amazon's AI crawler uses the user agent 'Amazonbot'. It crawls for Alexa and Amazon's AI features. You can block it in robots.txt with: User-agent: Amazonbot / Disallow: /

How do I know if an AI crawler has visited my website?

AI crawler visits appear in server logs under the respective user agent. With the command 'grep GPTBot /var/log/access.log' you can filter all GPTBot visits. In Google Analytics, bot visits typically do not appear as they are filtered as non-human traffic.

Comparison March 2026 · Updated May 2026

AI Crawler Comparison 2026: GPTBot, ClaudeBot, Perplexity, Google, Grok & More

Q: Which AI crawler brings the most traffic?

PerplexityBot is most likely to bring direct clickable traffic since Perplexity embeds source links directly in answers. GPTBot and ClaudeBot generally do not bring direct traffic. Google-Extended can make pages visible via AI Overviews in Google Search, which also generates clicks.

Q: What is Applebot-Extended?

Applebot-Extended is Apple's dedicated AI crawler for training Apple Intelligence features including Siri. It is separate from the regular Applebot which crawls for Spotlight and Safari suggestions. Like Google-Extended, it can be blocked independently without affecting regular Apple search features.

Eight major AI platforms, eight different crawlers – and each works differently. This guide compares GPTBot, ClaudeBot, PerplexityBot, Google-Extended, xAI-Bot (Grok), Amazonbot, Applebot-Extended and Bytespider: what each crawler reads, how often it crawls, how to control it and what it means for your AI visibility.

Overview: All AI crawlers at a glance

Every major AI platform operates its own web crawlers to collect content for training, knowledge updates or real-time search. These crawlers identify themselves via their user agent string and respect robots.txt – but beyond that, they differ significantly in crawl frequency, purpose, transparency and what they actually do with your content.

Core principle for all crawlers: They all respect robots.txt, read only the HTML source without JavaScript rendering (with partial exceptions) and prefer technically clean, fast websites. The differences lie in crawl frequency, purpose, whether they link back to your site and the degree of transparency each provider offers.

As of 2026, eight major AI crawlers are actively indexing the web. Understanding each one lets you make informed decisions about who gets access to your content – and who doesn't.

GPTBot – ChatGPT (OpenAI)

OpenAI

GPTBot

User agent

GPTBot

Platform

ChatGPT

Purpose

Training + updates

IP ranges

Published

Transparency

High

Source links

GPTBot is the best-known and most thoroughly documented AI crawler. OpenAI officially introduced it in August 2023 and publishes both a documentation page and the current IP address ranges – making it one of the most verifiable AI crawlers.

GPTBot crawls for two purposes: training future GPT models and updating the knowledge of already deployed models. Crawl frequency is moderate compared to Googlebot – important pages are typically visited on a cycle of weeks to months rather than days.

One key consideration: allowing GPTBot means your content may appear in ChatGPT answers, but ChatGPT rarely links directly to sources. You gain AI visibility but not referral traffic. If your goal is traffic over brand presence in AI answers, this trade-off is worth thinking about.

robots.txt control: User-agent: GPTBot – fully supported and reliably respected by OpenAI.

ClaudeBot – Claude (Anthropic)

Anthropic

ClaudeBot

User agent

ClaudeBot

Platform

Claude

Purpose

Context + answers

IP ranges

Not published

Transparency

Medium

Source links

Partial

ClaudeBot is Anthropic's crawler for the Claude language model. It operates on the same basic principles as GPTBot: reading HTML source, respecting robots.txt, no JavaScript rendering. Anthropic does not publish IP ranges, making it harder to verify crawl activity independently.

A distinctive feature of ClaudeBot is its focus on contextual understanding – Anthropic places great emphasis on Claude understanding relationships, nuance and long-form reasoning rather than just retrieving isolated facts. This means well-structured, content-rich pages with clear hierarchy are particularly valued.

Claude increasingly includes source citations in its answers, especially in its web search mode. This makes ClaudeBot more valuable for traffic than GPTBot in certain contexts.

robots.txt control: User-agent: ClaudeBot – fully supported.

PerplexityBot – Perplexity AI

Perplexity AI

PerplexityBot

User agent

PerplexityBot

Platform

Perplexity AI

Purpose

Real-time search

IP ranges

Published

Transparency

Medium

Source links

Yes

PerplexityBot differs from GPTBot and ClaudeBot in one critical respect: Perplexity is primarily a search engine with AI answers – not a pure chatbot. This means PerplexityBot crawls more actively and more frequently than training crawlers, often on a days-to-weeks cycle.

Perplexity cites sources directly and visibly in every answer, with clickable links back to the original pages. A citation by Perplexity is therefore the most traffic-valuable AI citation available right now. Any website wanting to appear as a Perplexity source needs technically clean, citable pages and an allowed PerplexityBot.

For most websites with public informational content, PerplexityBot is the single most important AI crawler to prioritise. Even if you block training crawlers, keeping PerplexityBot allowed is usually worthwhile.

robots.txt control: User-agent: PerplexityBot – respected. Perplexity also operates a secondary crawler called Perplexity-User for real-time lookups triggered by user queries.

Google-Extended – Gemini & AI Overviews

Google

Google-Extended

User agent

Google-Extended

Platform

Gemini, AI Overviews

Purpose

Gemini training

IP ranges

Published

Transparency

High

Source links

Yes (AI Overviews)

Google-Extended is a separate crawler from Google for AI-specific purposes, introduced in September 2023. It is completely distinct from Googlebot, which crawls for classic Google Search. Google-Extended collects data for training Gemini and for generating Google AI Overviews – the AI answers appearing at the top of search results.

The key advantage of Google-Extended: it can be controlled entirely independently from Googlebot. Blocking Google-Extended has no direct impact on your Google Search ranking – Googlebot continues crawling regardless. This gives website owners a genuine choice about AI training participation without SEO risk.

If you appear in AI Overviews, Google does include source citations. This makes Google-Extended particularly valuable for high-traffic informational queries where AI Overviews appear prominently.

robots.txt control: User-agent: Google-Extended – fully supported and independently controllable from Googlebot.

xAI-Bot – Grok (xAI)

xAI

xAI-Bot / Grok

User agent

xAI-Bot

Platform

Grok (X / Twitter)

Purpose

Training + search

IP ranges

Not published

Transparency

Low

Source links

Partial

Grok is the AI model from xAI, Elon Musk's AI company, deeply integrated into the X (formerly Twitter) platform. The web crawler identifies itself as xAI-Bot and crawls for both training and knowledge updates.

xAI is the least transparent of the major AI providers: no published IP ranges, limited official documentation and less clear communication about crawling scope or data usage. This makes it harder to verify whether robots.txt rules are reliably respected.

A key differentiator: Grok has real-time access to X posts and can incorporate social media signals directly into answers – independently of web crawling. This means your X presence and web presence work together for Grok visibility in a way that doesn't apply to other AI platforms.

robots.txt control: User-agent: xAI-Bot – respected according to current knowledge, but less verifiable than other crawlers.

Amazonbot – Amazon / Alexa AI

Amazon

Amazonbot

User agent

Amazonbot

Platform

Alexa, Amazon AI

Purpose

Alexa answers + training

IP ranges

Published

Transparency

Medium

Source links

Amazonbot is Amazon's web crawler, operating primarily for Alexa voice assistant answers and Amazon's broader AI initiatives. Amazon publishes its IP address ranges, which allows server-side verification of crawler authenticity – a notable plus for transparency.

While Alexa's web-based smart speaker market share has declined, Amazon is actively integrating AI into its shopping, AWS and Alexa ecosystems. Amazonbot's crawl scope reflects this: it focuses on factual, structured content that can be used to answer voice queries and power AI features across Amazon's product line.

For most websites, Amazonbot has lower immediate traffic impact than PerplexityBot or Google-Extended. However, e-commerce sites, local businesses and informational content producers benefit from maintaining Alexa visibility, especially as Amazon expands its AI answer features.

robots.txt control: User-agent: Amazonbot – fully supported. Amazon publishes clear documentation and IP ranges for verification.

Applebot-Extended – Apple Intelligence / Siri

Apple

Applebot-Extended

User agent

Applebot-Extended

Platform

Apple Intelligence, Siri

Purpose

AI training (Apple)

IP ranges

Published

Transparency

High

Source links

Applebot-Extended is Apple's dedicated AI training crawler, introduced in 2024 alongside the rollout of Apple Intelligence. It is completely separate from the regular Applebot, which crawls for Spotlight search and Safari Suggestions. The two can be controlled independently.

Apple introduced this separation deliberately to give website owners control over AI training participation without affecting their presence in Apple's regular search features. Blocking Applebot-Extended does not affect Spotlight indexing or Siri's ability to surface your website as a regular result.

Apple Intelligence is tightly integrated into iOS, iPadOS and macOS – a user base of over a billion active devices. As Apple Intelligence expands its capabilities, Applebot-Extended's strategic importance will increase significantly. Websites targeting iOS users in particular should consider their Applebot-Extended policy carefully.

Apple publishes IP ranges for verification, making Applebot-Extended one of the more transparent AI crawlers despite being relatively new.

robots.txt control: User-agent: Applebot-Extended – fully supported and independently controllable from regular Applebot.

Bytespider – ByteDance / TikTok AI

ByteDance

Bytespider

User agent

Bytespider

Platform

ByteDance / TikTok AI

Purpose

Training

IP ranges

Not published

Transparency

Low

Source links

Bytespider is ByteDance's web crawler – the company behind TikTok and the large language model family used in their AI products. It has been observed crawling at very high volumes, in some cases more aggressively than other AI crawlers, which has led to concerns among webmasters and hosting providers.

ByteDance does not publish IP ranges or comprehensive documentation for Bytespider, making it the least transparent of all major AI crawlers. There is no clear public-facing AI product that surfaces web content with source citations – Bytespider appears to be primarily a training data crawler.

Given the lack of transparency, the absence of source links and reported aggressive crawl behaviour, many website owners choose to block Bytespider by default unless they have a specific reason to allow it. This is a reasonable precaution that has no known negative impact on any user-facing AI search product.

robots.txt control: User-agent: Bytespider – listed as respected, but less verifiable due to limited documentation.

Direct comparison: all crawlers in one table

Crawler	User agent	JS rendering	Crawl freq.	Source links	IP ranges	Transparency	robots.txt	Recommendation
GPTBot	GPTBot	No	Weeks–months	No	Yes	High	Yes	Allow for AI presence
ClaudeBot	ClaudeBot	No	Weeks–months	Partial	No	Medium	Yes	Allow for AI presence
PerplexityBot	PerplexityBot	No	Days–weeks	Yes	Yes	Medium	Yes	Highest priority
Google-Extended	Google-Extended	Partial	Regular	Yes (AI Overviews)	Yes	High	Yes	Allow for AI Overviews
xAI-Bot	xAI-Bot	No	Unknown	Partial	No	Low	Yes	Optional
Amazonbot	Amazonbot	No	Weeks	No	Yes	Medium	Yes	Allow if targeting Alexa
Applebot-Extended	Applebot-Extended	No	Weeks–months	No	Yes	High	Yes	Allow for iOS audience
Bytespider	Bytespider	No	High / aggressive	No	No	Low	Unclear	Block by default

💡 Tip: Use the robots.txt Validator to check whether your current configuration correctly controls each of these crawlers. Capitalisation in user agent names matters – GPTBot and gptbot are treated differently.

Which strategy is right for you?

Maximum AI visibility – allow all except Bytespider

For public websites with informational content, service providers, blogs and tools: allow all crawlers except Bytespider. This is the most sensible default for anyone wanting to be cited in as many AI answers as possible. Bytespider's lack of transparency and aggressive crawl behaviour makes it the one exception worth blocking in most cases.

Selective – prioritise search-based crawlers

If your primary goal is clickable traffic from AI sources, prioritise PerplexityBot and Google-Extended. These are the two crawlers where a citation directly translates into a link back to your website. GPTBot, ClaudeBot and Amazonbot build AI presence without driving direct traffic.

Block training, allow search

Anyone who does not want their content used for AI model training – but wants to appear in real-time AI search results – can block GPTBot, ClaudeBot, Applebot-Extended and Amazonbot while keeping PerplexityBot and Google-Extended active. This separates the training use case from the search visibility use case.

Block all AI crawlers

For websites with copyrighted, paid or sensitive content, blocking all AI crawlers is a reasonable choice. This is a conscious decision against AI visibility and carries the trade-off of not appearing in any AI-generated answers. For publishers concerned about content use without compensation, this may be the right call.

Ready-made robots.txt templates

Allow all AI crawlers (maximum visibility)

# All crawlers allowed
User-agent: *
Disallow:

Sitemap: https://yourdomain.com/sitemap.xml

Recommended default: allow all, block Bytespider

# Block Bytespider (low transparency, no source links)
User-agent: Bytespider
Disallow: /

# All other crawlers allowed
User-agent: *
Disallow:

Sitemap: https://yourdomain.com/sitemap.xml

Block training crawlers only (allow search)

# Training crawlers blocked, search-based crawlers allowed
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: Bytespider
Disallow: /

# PerplexityBot & Googlebot still allowed
User-agent: *
Disallow:

Sitemap: https://yourdomain.com/sitemap.xml

Block all AI crawlers

# All AI crawlers blocked
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Perplexity-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

# Googlebot still allowed
User-agent: *
Disallow:

Sitemap: https://yourdomain.com/sitemap.xml

Make a conscious decision: which AI crawlers should have access to your content?
Check robots.txt for correct user agent names – capitalisation matters (GPTBot not gptbot)
Add Perplexity-User alongside PerplexityBot if you want to block all Perplexity access
Blocking Google-Extended does not affect your Google Search ranking
Blocking Applebot-Extended does not affect Spotlight or regular Siri results
Test robots.txt with a validator after every change
TTFB under 800ms – so all crawlers can fully read your pages

Which AI crawlers have access to your website?

AI-Ready Check analyses in seconds whether GPTBot, ClaudeBot, PerplexityBot, Google-Extended and more are correctly configured – free, no account needed.

Test for free now →

Frequently Asked Questions

Which AI crawler brings the most traffic?+

PerplexityBot is currently the most traffic-valuable AI crawler because Perplexity embeds clickable source links directly in every answer. Google-Extended is second – AI Overviews in Google Search also include source citations and can drive significant clicks. GPTBot, ClaudeBot, Amazonbot and Applebot-Extended generally do not generate direct referral traffic.

Can I treat different crawlers differently in robots.txt?+

Yes – each crawler has its own user agent and can be controlled separately. You can block GPTBot while PerplexityBot is allowed, or block Applebot-Extended for AI training while regular Applebot continues indexing for Spotlight. Every combination is possible with individual User-agent blocks in robots.txt.

What is the difference between Applebot and Applebot-Extended?+

Regular Applebot crawls for Apple's Spotlight search, Safari Suggestions and Siri's ability to surface web results. Applebot-Extended is Apple's dedicated AI training crawler for Apple Intelligence features. They can be controlled independently – blocking Applebot-Extended does not affect regular Apple search functionality.

Should I block Bytespider?+

For most websites, blocking Bytespider is a reasonable default. It has the lowest transparency of all major AI crawlers – no published IP ranges, limited documentation and no consumer-facing AI product that cites sources with links. Reports of aggressive crawl volumes add to the case for blocking it. There is no known traffic or visibility benefit to allowing it currently.

Does blocking Google-Extended affect my Google ranking?+

No. Blocking Google-Extended only affects Gemini training and Google AI Overviews. Googlebot – which is responsible for your Google Search ranking – is completely unaffected. Google explicitly designed this separation so that website owners can opt out of AI training without SEO consequences.

How do I verify an AI crawler visit in my server logs?+

AI crawler visits appear in server access logs under their user agent string. To filter visits in a Linux environment: grep GPTBot /var/log/access.log. Replace GPTBot with the relevant user agent for each crawler. In web analytics tools like Google Analytics, bot traffic is typically filtered out automatically.

Does the language of my website affect AI crawler visibility?+

Yes, but less than you might expect. All major AI systems support multiple languages. However, English-language content is more strongly represented in AI training data and tends to be cited more frequently in AI answers. For maximum AI visibility across all platforms, offering key content in English alongside other languages is worth considering.