Is your site blocking GPTBot? (Most are, here's how to check)

GPTBot is the crawler OpenAI uses to read websites so ChatGPT can answer questions about them. If your site blocks it, ChatGPT and the growing number of people who ask it things simply can't see you. A surprising number of sites do block it without realizing. The fix is often a one-line change. The hard part is that the block is usually invisible in the place everyone looks.

Why this matters now

More buying research starts inside an AI assistant every month. Someone asks “what's the best tool for X?” or “is [your company] any good?” and the assistant answers from what it was able to read. If the assistant's crawler was blocked, you get one of three bad outcomes: it doesn't mention you, it describes you from stale third-party data, or it openly says it has no information about you. We've scanned sites where Perplexity responded, verbatim:

“I don't have reliable information about this company. It may be a small or new business with limited online presence.”

That site was neither small nor new. It was blocking GPTBot at the CDN.

The crawlers that matter

“Blocking GPTBot” is shorthand. There are several AI crawlers, and they do different jobs. You generally want to allow all of these:

GPTBot.OpenAI's content crawler. Powers what ChatGPT knows.
OAI-SearchBot.Fetches pages to cite in ChatGPT's search results. Different from GPTBot; allow both.
ClaudeBot and Claude-Webare Anthropic's crawlers for Claude.
PerplexityBot and Perplexity-Userare Perplexity's crawler and on-demand fetcher.
Google-Extended.Controls whether Google's AI products (Gemini, AI Overviews) can use your content. Independent of normal Googlebot/SEO.

How to check in 30 seconds

Open https://yourdomain.com/robots.txt in a browser. A blocking robots.txt looks like this:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

An allowing robots.txt looks like this (or simply omits them):

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

The catch: robots.txt is not the only place you can be blocked

This is the part that trips up most people. Your robots.txt can look perfect while your site still blocks every AI crawler, because the block lives one layer up, at your CDN or WAF (Cloudflare, Fastly, AWS, Vercel, etc.). Cloudflare in particular shipped a “block AI bots” toggle that many sites enabled without connecting it to AI visibility. When that's on, the crawler gets a 403 before robots.txt is ever consulted.

You can't see that by reading robots.txt. You have to request a page as the bot and check the status code. From a terminal:

curl -A "GPTBot" -I https://yourdomain.com
# 200 OK  -> reachable
# 403 / 401 / challenge page -> blocked upstream

That's exactly the check AEOScan automates, for all of the crawlers above at once. Scan your site freeand it tells you, in about 30 seconds, which assistants can reach you, what they actually say about you, and the precise fix for anything that's blocked. No signup.

How to allow them safely

In robots.txt, allow the crawlers listed above (or remove any Disallow: / rules targeting them).
In your CDN/WAF, turn off any “block AI bots” / “block AI scrapers” rule, or add the AI user-agents to an allowlist. This is the step people miss.
Re-test as the bot (the curl -A command above, or a re-scan) and confirm a 200.

Worried about cost or scraping load? Allowing read access for answer engines is different from allowing bulk training scrapes, and the upside is being present where people now ask their questions. For most sites that want customers, visibility wins.

Being readable isn't the same as being citable

Letting the crawlers in is necessary, not sufficient. Once they can reach you, two things decide whether they actually understand and cite you:

Content without JavaScript.AI crawlers generally don't run JS. If your key content only renders client-side, the bot sees an empty shell. Check the raw HTML, not the rendered page. We cover the JS rendering trap here.
Structured data & llms.txt. Clean JSON-LD and an llms.txt map help assistants identify what you are and which pages matter.

Those are separate checks (we cover llms.txt in its own guide, the full path to citation in how to get cited by ChatGPT, and how being cited differs from ranking in AEO vs SEO). The point for today: start by making sure the door is open. You can't be cited if you can't be read.

Frequently asked questions

Should I block AI crawlers?

For most sites that want to be found, no. Blocking GPTBot, ClaudeBot and PerplexityBot means assistants answer questions about you from stale or second-hand data, or say they don't know you. Block only if you have a specific reason to keep content out of AI answers (paywalled or sensitive material).

Does blocking GPTBot affect my Google ranking?

No. GPTBot is OpenAI's crawler, separate from Googlebot. Blocking it does not change Google rankings, but it does remove you from ChatGPT's answers. Google's own AI features use a separate token, Google-Extended, which you can control independently.

What's the difference between GPTBot and OAI-SearchBot?

GPTBot is used to crawl content; OAI-SearchBot fetches pages to cite in ChatGPT's search results. If you want to appear in ChatGPT answers and its citations, allow both. Allowing one and blocking the other is a common, accidental gap.

What is llms.txt?

A plain-text file at the root of your site (like robots.txt) that gives language models a curated map of your most important pages. It doesn't replace allowing the crawlers, but it helps them find and prioritize the right content.

Is your site blocking GPTBot?Most are, here's how to check.