In one sentence
sitemap.xml is a file that lists all URLs in your site in XML format, telling search engines and AI crawlers "these are the pages on our site."
What does this look like in practice?
You create /sitemap.xml and write something like:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-05-24</lastmod>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>https://example.com/pricing</loc>
<lastmod>2026-05-20</lastmod>
</url>
...
</urlset>
With this, Google and AI crawlers can see at a glance "this site has a total of N pages, and here are each one's last modification dates."
Why it matters (validated by GEO Meter data)
sitemap.xml is the most basic and essential measure for AI crawlers to efficiently grasp a site's structure. It is a standard practice deployed by nearly all of the companies we observe.
- Crawlers crawl efficiently: no pages get missed
- Update frequency can be conveyed to AI (
changefreq,lastmod): fresh information is cited preferentially - Dynamic URLs are covered too: deep pages that search engines struggle to find still get exposure
Major fields
| Field | Purpose |
|---|---|
loc | URL (required) |
lastmod | Last modification date (recommended, freshness signal) |
changefreq | Update frequency (daily / weekly / monthly) |
priority | Priority (0.0-1.0, used as reference only) |
How to generate one
- Next.js / WordPress and others: an auto-generation plugin or built-in feature is fine
- Static sites: run a generation script at build time
- Hand-writing is not recommended (risk of missed updates)
Relationship with robots.txt
If you add a single line Sitemap: https://example.com/sitemap.xml to robots.txt, crawlers will discover it automatically.
For details, see robots.txt as well.