Start free →

Glossary — Implementation

robots.txt— definition

In one sentence

robots.txt is a text file placed at the entrance of a website (/robots.txt) that instructs crawlers (search-engine crawl bots and AI crawlers) on "where they may look."

What does this look like in practice?

For example, if your site's /robots.txt contains:

User-agent: GPTBot Allow: / User-agent: Claude-Web Allow: / User-agent: Google-Extended Allow: / Sitemap: https://example.com/sitemap.xml

then crawlers from ChatGPT / Claude / Gemini understand "ah, we may look at the entire site" and proceed to crawl.

Conversely, if you write Disallow: /, you can completely shut these AI crawlers out.

GEO best practices

BotTreatment in GEO measures
GPTBot (OpenAI / ChatGPT)Allow <- required to be cited
Claude-Web (Anthropic / Claude)Allow
Google-Extended (Google / Gemini)Allow
PerplexityBot (Perplexity)Allow
CCBot (Common Crawl)Optional (used for training data)

Allow everything by default is the iron rule of GEO. Setting Disallow makes you completely invisible in AI search.

Common mistakes

  • Unintentional Disallow: many companies still disallow GPTBot via old templates (templates from before 2024 need review)
  • No configuration at all: with no configuration, the default is Allow, so this is at least tolerable
  • Disallowing AI bots only: the desire not to be used as training data is understandable, but you also lose citation opportunities

Related files

For GEO, robots.txt is typically maintained alongside:

  • sitemap.xml: explicitly tells crawlers which URLs to visit
  • llms.txt: conveys the main content to AI in Markdown

Related terms

Read more

→ Read the related full guide

この記事をシェア

XLinkedIn

← Back to glossary top