In one sentence
GPTBot is a Web crawler operated by OpenAI. It visits your site, reads articles, and collects their content as candidates to be cited by ChatGPT and SearchGPT.
What does it look like in practice?
For example, when ChatGPT is asked, "What are some recommended B2B SaaS products in Japan?", ChatGPT cites your SaaS service from the Web information GPTBot has previously crawled.
If GPTBot hasn't crawled you, the chance of being cited by ChatGPT becomes nearly zero no matter how good your service is.
How to configure
Control via robots.txt:
# Allow GPTBot (recommended for GEO measures)
User-agent: GPTBot
Allow: /
# To fully refuse
User-agent: GPTBot
Disallow: /
Why you should Allow it
- ChatGPT is one of the most widely used generative AIs: blocking GPTBot = losing AI citation opportunities
- Resistance to being used as training data: the sentiment is understandable, but you also lose citation opportunities
- The content industry is split: some news sites Disallow, but SaaS / general companies mostly Allow
Related crawlers
OpenAI operates several bots besides GPTBot:
- OAI-SearchBot: dedicated to SearchGPT (introduced from 2024)
- ChatGPT-User: fetches a URL when a user pastes it into ChatGPT
For GEO measures, Allow all of these as a baseline.
See also robots.txt and OpenAI.