GPTBot is the crawler OpenAI uses to read websites so ChatGPT can answer questions about them. If your site blocks it — and a surprising number do, usually without realizing it — then ChatGPT, and increasingly the people who ask it things, simply can't see you. The fix is often a one-line change. The hard part is that the block is frequently invisible in the place everyone looks.
Why this matters now
More buying research starts inside an AI assistant every month. Someone asks “what's the best tool for X?” or “is [your company] any good?” and the assistant answers from what it was able to read. If the assistant's crawler was blocked, you get one of three bad outcomes: it doesn't mention you, it describes you from stale third-party data, or it openly says it has no information about you. We've scanned sites where Perplexity responded, verbatim:
“I don't have reliable information about this company. It may be a small or new business with limited online presence.”
That site was neither small nor new. It was blocking GPTBot at the CDN.
The crawlers that matter
“Blocking GPTBot” is shorthand. There are several AI crawlers, and they do different jobs. You generally want to allow all of these:
- GPTBot— OpenAI's content crawler. Powers what ChatGPT knows.
- OAI-SearchBot— fetches pages to cite in ChatGPT's search results. Different from GPTBot; allow both.
- ClaudeBot / Claude-Web— Anthropic's crawlers for Claude.
- PerplexityBot / Perplexity-User— Perplexity's crawler and on-demand fetcher.
- Google-Extended— controls whether Google's AI products (Gemini, AI Overviews) can use your content. Independent of normal Googlebot/SEO.
How to check in 30 seconds
Open https://yourdomain.com/robots.txt in a browser. A blocking robots.txt looks like this:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /An allowing robots.txt looks like this (or simply omits them):
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /The catch: robots.txt is not the only place you can be blocked
This is the part that trips up most people. Your robots.txt can look perfect while your site still blocks every AI crawler — because the block lives one layer up, at your CDN or WAF (Cloudflare, Fastly, AWS, Vercel, etc.). Cloudflare in particular shipped a “block AI bots” toggle that many sites enabled without connecting it to AI visibility. When that's on, the crawler gets a 403 before robots.txt is ever consulted.
You can't see that by reading robots.txt. You have to request a page as the bot and check the status code. From a terminal:
curl -A "GPTBot" -I https://yourdomain.com
# 200 OK -> reachable
# 403 / 401 / challenge page -> blocked upstreamThat's exactly the check AEOScan automates, for all of the crawlers above at once. Scan your site freeand it tells you, in about 30 seconds, which assistants can reach you, what they actually say about you, and the precise fix for anything that's blocked. No signup.
How to allow them safely
- In robots.txt, allow the crawlers listed above (or remove any
Disallow: /rules targeting them). - In your CDN/WAF, turn off any “block AI bots” / “block AI scrapers” rule, or add the AI user-agents to an allowlist. This is the step people miss.
- Re-test as the bot (the
curl -Acommand above, or a re-scan) and confirm a200.
Worried about cost or scraping load? Allowing read access for answer engines is different from allowing bulk training scrapes — and the upside is being present where people now ask their questions. For most sites that want customers, visibility wins.
Being readable isn't the same as being citable
Letting the crawlers in is necessary, not sufficient. Once they can reach you, two things decide whether they actually understand and cite you:
- Content without JavaScript.AI crawlers generally don't run JS. If your key content only renders client-side, the bot sees an empty shell. Check the raw HTML, not the rendered page.
- Structured data & llms.txt. Clean JSON-LD and an
llms.txtmap help assistants identify what you are and which pages matter.
Those are separate checks (we cover llms.txt in its own guide). The point for today: start by making sure the door is open. You can't be cited if you can't be read.
Frequently asked questions
Should I block AI crawlers?
For most sites that want to be found, no. Blocking GPTBot, ClaudeBot and PerplexityBot means assistants answer questions about you from stale or second-hand data — or say they don't know you. Block only if you have a specific reason to keep content out of AI answers (paywalled or sensitive material).
Does blocking GPTBot affect my Google ranking?
No. GPTBot is OpenAI's crawler, separate from Googlebot. Blocking it does not change Google rankings — but it does remove you from ChatGPT's answers. Google's own AI features use a separate token, Google-Extended, which you can control independently.
What's the difference between GPTBot and OAI-SearchBot?
GPTBot is used to crawl content; OAI-SearchBot fetches pages to cite in ChatGPT's search results. If you want to appear in ChatGPT answers and its citations, allow both. Allowing one and blocking the other is a common, accidental gap.
What is llms.txt?
A plain-text file at the root of your site (like robots.txt) that gives language models a curated map of your most important pages. It doesn't replace allowing the crawlers, but it helps them find and prioritize the right content.