XML Sitemap Guide: Structure, Tags, Limits, and Validation
A complete guide to XML sitemaps covering structure, required tags, sitemap index files, size limits, dynamic generation, and validation.
Last updated: 2026-02-17
What Is an XML Sitemap?
An XML sitemap is a structured file that lists the URLs on your website in a format search engines can parse directly. It follows the Sitemap Protocol, an open standard supported by Google, Bing, and other major search engines.
The file is typically served at /sitemap.xml and acts as a roadmap for crawlers. Instead of relying solely on link discovery, search engines can read your sitemap to find every page you want indexed.
This guide covers everything you need to know about XML sitemaps -- from the tag structure to size limits, dynamic generation, and validation.
XML Sitemap Structure
Every XML sitemap follows the same basic structure. The root element is <urlset>, which contains one or more <url> entries.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-02-10</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/about</loc>
<lastmod>2026-01-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
Key structural rules:
- The file must be UTF-8 encoded.
- The
<urlset>element must include the Sitemap Protocol namespace (http://www.sitemaps.org/schemas/sitemap/0.9). - All URLs must be absolute (full path including protocol and domain).
- Special characters in URLs must be entity-escaped (e.g.,
&becomes&).
Required and Optional Tags
The Sitemap Protocol defines four tags. Only one is required.
loc (Required)
The <loc> tag contains the full URL of the page. This is the only mandatory tag.
<loc>https://example.com/pricing</loc>
Rules for <loc>:
- Must be an absolute URL including the protocol (
https://). - Must match the host of the sitemap file itself (a sitemap at
example.comcannot list URLs fromother.com). - Must be properly URL-encoded. Spaces become
%20, ampersands become&. - Should point to the canonical version of the URL.
lastmod (Optional but Recommended)
The <lastmod> tag indicates when the page was last meaningfully modified.
<lastmod>2026-02-15</lastmod>
The date must follow W3C Datetime format. Acceptable formats:
YYYY-MM-DD(e.g.,2026-02-15)YYYY-MM-DDThh:mm:ss+00:00(full datetime with timezone)
Only update lastmod when the page content actually changes. Setting it to the current date on every build defeats the purpose and teaches search engines to ignore it. Google has explicitly warned against this practice.
changefreq (Optional -- Largely Ignored)
The <changefreq> tag hints at how often the page is likely to change.
<changefreq>weekly</changefreq>
Valid values: always, hourly, daily, weekly, monthly, yearly, never.
Google has confirmed they do not use changefreq for crawling decisions. Bing similarly gives it minimal weight. You can include it for completeness, but do not expect it to influence crawl behavior.
priority (Optional -- Largely Ignored)
The <priority> tag assigns a relative importance to each URL, from 0.0 (least important) to 1.0 (most important).
<priority>0.8</priority>
Like changefreq, Google largely ignores this tag. The priority is relative to your own site only -- it does not affect how your pages rank against other websites.
Focus your effort on loc and lastmod. These are the two tags search engines actually use. Skip changefreq and priority unless your tooling adds them automatically.
Sitemap Index Files
When your site has more than 50,000 URLs or your sitemap file would exceed 50 MB, you need a sitemap index file. This is a master file that references multiple individual sitemaps.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-02-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-02-14</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-categories.xml</loc>
<lastmod>2026-02-10</lastmod>
</sitemap>
</sitemapindex>
The structure uses <sitemapindex> as the root element, with <sitemap> entries containing <loc> and optional <lastmod> tags.
No nesting
A sitemap index can only reference sitemaps, not other sitemap index files. Nesting is not allowed.
Same host required
All referenced sitemaps must be on the same host as the sitemap index file.
Unlimited sitemaps
There is no limit to how many sitemaps a sitemap index can reference. A site with millions of URLs might have hundreds of individual sitemaps.
Organizational benefit
Even below the 50K limit, splitting sitemaps by content type (products, blog, pages) makes monitoring easier.
Size Limits
The Sitemap Protocol imposes two hard limits per sitemap file:
| Limit | Value | What Happens If Exceeded |
|---|---|---|
| Maximum URLs | 50,000 per file | Search engines will reject the sitemap or only process partial content |
| Maximum file size | 50 MB (uncompressed) | Same -- rejected or partially processed |
| Gzip compression | Allowed | Reduces transfer size but uncompressed size still counts against the 50 MB limit |
For most sites, these limits are never a concern. A typical sitemap with 1,000 URLs is well under 1 MB. But e-commerce sites with hundreds of thousands of product pages will need multiple sitemaps managed through a sitemap index.
Track Sitemap Health Automatically
Site Watcher monitors your XML sitemaps for size limit violations, broken URLs, and format errors. Get alerted before search engines notice.
Dynamic Sitemap Generation
Manually maintaining an XML sitemap is impractical for most sites. Dynamic generation is the standard approach.
How Dynamic Generation Works
Instead of a static XML file, your server generates the sitemap on demand by querying your database or content system for current URLs. The response is served with Content-Type: application/xml.
Framework-Specific Approaches
Next.js (App Router)
Next.js supports sitemaps natively through a sitemap.ts file in the app directory:
import { MetadataRoute } from 'next'
export default function sitemap(): MetadataRoute.Sitemap {
return [
{
url: 'https://example.com',
lastModified: new Date(),
},
{
url: 'https://example.com/blog',
lastModified: new Date(),
},
]
}
For large sites, use generateSitemaps() to create multiple sitemaps automatically.
WordPress
WordPress has included automatic XML sitemap generation since version 5.5. The default sitemap is at /wp-sitemap.xml. Plugins like Yoast SEO or Rank Math provide more granular control.
Server-Side (Node.js/Express)
app.get('/sitemap.xml', async (req, res) => {
const pages = await db.getPublishedPages();
let xml = '<?xml version="1.0" encoding="UTF-8"?>';
xml += '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
for (const page of pages) {
xml += `<url><loc>${page.url}</loc>`;
xml += `<lastmod>${page.updatedAt}</lastmod></url>`;
}
xml += '</urlset>';
res.header('Content-Type', 'application/xml');
res.send(xml);
});
Static Site Generators
Static site generators (Gatsby, Hugo, Astro, 11ty) typically generate sitemaps at build time. The sitemap is created as a static XML file during the build process and deployed with the rest of the site.
This approach means the sitemap is always in sync with the deployed content, but it will not reflect content changes until the next build.
Entity Escaping
URLs in XML sitemaps must follow XML entity escaping rules. The five characters that require escaping:
| Character | Escape Sequence | Example |
|---|---|---|
| Ampersand (&) | & | page?a=1&b=2 |
| Single quote (') | ' | O'Brien |
| Double quote (") | " | title="test" |
| Less than (<) | < | Rarely needed in URLs |
| Greater than (>) | > | Rarely needed in URLs |
In practice, the ampersand is the only one you will encounter regularly in URLs. Query string parameters separated by & must always be escaped to & in XML.
Validating Your XML Sitemap
An invalid sitemap can be silently ignored by search engines. Validation catches problems before they affect your indexing.
Check XML syntax
Use an XML validator to confirm your sitemap is well-formed. Missing closing tags, improper nesting, or encoding issues will cause parse failures.
Validate against the Sitemap Protocol schema
The official XSD schema is at sitemaps.org/schemas/sitemap/0.9/sitemap.xsd. Online validators can check your file against this schema.
Verify URLs return 200 status codes
Every URL in your sitemap should return a 200 OK response. URLs returning 301, 404, or 5xx errors should be removed.
Check in Google Search Console
After submitting your sitemap, Google Search Console shows parsing errors, URL counts, and indexing status. This is the definitive check.
Confirm file size and URL count
Verify you are within the 50,000 URL and 50 MB limits. For sitemap index files, check each referenced sitemap individually.
Common XML Sitemap Mistakes
Including non-canonical URLs. If a page has a canonical tag pointing elsewhere, the non-canonical version should not be in your sitemap. This sends conflicting signals to search engines.
Listing URLs blocked by robots.txt. If robots.txt disallows a URL, including it in your sitemap creates a contradiction. Crawlers cannot access the page, but your sitemap says it is important.
Stale lastmod dates. Setting lastmod to the current date on every build, or never updating it at all, makes the tag useless. It should reflect the actual last content modification.
Missing protocol in URLs. Every <loc> value must include https:// (or http://). Relative paths are not valid.
Wrong content type. The sitemap must be served with Content-Type: application/xml or text/xml. Serving it as text/html can cause parsing failures.
Mixing HTTP and HTTPS. If your site uses HTTPS, all URLs in the sitemap must use HTTPS. Do not list HTTP versions.
A well-structured XML sitemap is one of the lowest-effort, highest-impact technical SEO tasks you can do -- but only if the URLs are accurate and the format is valid.
Automate Sitemap Monitoring
Site Watcher watches your sitemaps, SSL certificates, DNS records, and uptime around the clock. Free for 3 targets. $39/mo for unlimited monitoring.