XML Sitemap Guide: Structure, Tags, Limits, and Validation

A complete guide to XML sitemaps covering structure, required tags, sitemap index files, size limits, dynamic generation, and validation.

Last updated: 2026-02-17

What Is an XML Sitemap?

An XML sitemap is a structured file that lists the URLs on your website in a format search engines can parse directly. It follows the Sitemap Protocol, an open standard supported by Google, Bing, and other major search engines.

The file is typically served at /sitemap.xml and acts as a roadmap for crawlers. Instead of relying solely on link discovery, search engines can read your sitemap to find every page you want indexed.

This guide covers everything you need to know about XML sitemaps -- from the tag structure to size limits, dynamic generation, and validation.

XML Sitemap Structure

Every XML sitemap follows the same basic structure. The root element is <urlset>, which contains one or more <url> entries.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-02-10</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/about</loc>
    <lastmod>2026-01-15</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Key structural rules:

  • The file must be UTF-8 encoded.
  • The <urlset> element must include the Sitemap Protocol namespace (http://www.sitemaps.org/schemas/sitemap/0.9).
  • All URLs must be absolute (full path including protocol and domain).
  • Special characters in URLs must be entity-escaped (e.g., & becomes &amp;).

Required and Optional Tags

The Sitemap Protocol defines four tags. Only one is required.

loc (Required)

The <loc> tag contains the full URL of the page. This is the only mandatory tag.

<loc>https://example.com/pricing</loc>

Rules for <loc>:

  • Must be an absolute URL including the protocol (https://).
  • Must match the host of the sitemap file itself (a sitemap at example.com cannot list URLs from other.com).
  • Must be properly URL-encoded. Spaces become %20, ampersands become &amp;.
  • Should point to the canonical version of the URL.

lastmod (Optional but Recommended)

The <lastmod> tag indicates when the page was last meaningfully modified.

<lastmod>2026-02-15</lastmod>

The date must follow W3C Datetime format. Acceptable formats:

  • YYYY-MM-DD (e.g., 2026-02-15)
  • YYYY-MM-DDThh:mm:ss+00:00 (full datetime with timezone)

Only update lastmod when the page content actually changes. Setting it to the current date on every build defeats the purpose and teaches search engines to ignore it. Google has explicitly warned against this practice.

changefreq (Optional -- Largely Ignored)

The <changefreq> tag hints at how often the page is likely to change.

<changefreq>weekly</changefreq>

Valid values: always, hourly, daily, weekly, monthly, yearly, never.

Google has confirmed they do not use changefreq for crawling decisions. Bing similarly gives it minimal weight. You can include it for completeness, but do not expect it to influence crawl behavior.

priority (Optional -- Largely Ignored)

The <priority> tag assigns a relative importance to each URL, from 0.0 (least important) to 1.0 (most important).

<priority>0.8</priority>

Like changefreq, Google largely ignores this tag. The priority is relative to your own site only -- it does not affect how your pages rank against other websites.

Focus your effort on loc and lastmod. These are the two tags search engines actually use. Skip changefreq and priority unless your tooling adds them automatically.

Sitemap Index Files

When your site has more than 50,000 URLs or your sitemap file would exceed 50 MB, you need a sitemap index file. This is a master file that references multiple individual sitemaps.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2026-02-15</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-blog.xml</loc>
    <lastmod>2026-02-14</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-categories.xml</loc>
    <lastmod>2026-02-10</lastmod>
  </sitemap>
</sitemapindex>

The structure uses <sitemapindex> as the root element, with <sitemap> entries containing <loc> and optional <lastmod> tags.

No nesting

A sitemap index can only reference sitemaps, not other sitemap index files. Nesting is not allowed.

Same host required

All referenced sitemaps must be on the same host as the sitemap index file.

Unlimited sitemaps

There is no limit to how many sitemaps a sitemap index can reference. A site with millions of URLs might have hundreds of individual sitemaps.

Organizational benefit

Even below the 50K limit, splitting sitemaps by content type (products, blog, pages) makes monitoring easier.

Size Limits

The Sitemap Protocol imposes two hard limits per sitemap file:

LimitValueWhat Happens If Exceeded
Maximum URLs50,000 per fileSearch engines will reject the sitemap or only process partial content
Maximum file size50 MB (uncompressed)Same -- rejected or partially processed
Gzip compressionAllowedReduces transfer size but uncompressed size still counts against the 50 MB limit

For most sites, these limits are never a concern. A typical sitemap with 1,000 URLs is well under 1 MB. But e-commerce sites with hundreds of thousands of product pages will need multiple sitemaps managed through a sitemap index.

Track Sitemap Health Automatically

Site Watcher monitors your XML sitemaps for size limit violations, broken URLs, and format errors. Get alerted before search engines notice.

Dynamic Sitemap Generation

Manually maintaining an XML sitemap is impractical for most sites. Dynamic generation is the standard approach.

How Dynamic Generation Works

Instead of a static XML file, your server generates the sitemap on demand by querying your database or content system for current URLs. The response is served with Content-Type: application/xml.

Framework-Specific Approaches

Next.js (App Router)

Next.js supports sitemaps natively through a sitemap.ts file in the app directory:

import { MetadataRoute } from 'next'

export default function sitemap(): MetadataRoute.Sitemap {
  return [
    {
      url: 'https://example.com',
      lastModified: new Date(),
    },
    {
      url: 'https://example.com/blog',
      lastModified: new Date(),
    },
  ]
}

For large sites, use generateSitemaps() to create multiple sitemaps automatically.

WordPress

WordPress has included automatic XML sitemap generation since version 5.5. The default sitemap is at /wp-sitemap.xml. Plugins like Yoast SEO or Rank Math provide more granular control.

Server-Side (Node.js/Express)

app.get('/sitemap.xml', async (req, res) => {
  const pages = await db.getPublishedPages();

  let xml = '<?xml version="1.0" encoding="UTF-8"?>';
  xml += '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

  for (const page of pages) {
    xml += `<url><loc>${page.url}</loc>`;
    xml += `<lastmod>${page.updatedAt}</lastmod></url>`;
  }

  xml += '</urlset>';
  res.header('Content-Type', 'application/xml');
  res.send(xml);
});

Static Site Generators

Static site generators (Gatsby, Hugo, Astro, 11ty) typically generate sitemaps at build time. The sitemap is created as a static XML file during the build process and deployed with the rest of the site.

This approach means the sitemap is always in sync with the deployed content, but it will not reflect content changes until the next build.

Entity Escaping

URLs in XML sitemaps must follow XML entity escaping rules. The five characters that require escaping:

CharacterEscape SequenceExample
Ampersand (&)&amp;page?a=1&amp;b=2
Single quote (')&apos;O&apos;Brien
Double quote (")&quot;title=&quot;test&quot;
Less than (<)&lt;Rarely needed in URLs
Greater than (>)&gt;Rarely needed in URLs

In practice, the ampersand is the only one you will encounter regularly in URLs. Query string parameters separated by & must always be escaped to &amp; in XML.

Validating Your XML Sitemap

An invalid sitemap can be silently ignored by search engines. Validation catches problems before they affect your indexing.

1

Check XML syntax

Use an XML validator to confirm your sitemap is well-formed. Missing closing tags, improper nesting, or encoding issues will cause parse failures.

2

Validate against the Sitemap Protocol schema

The official XSD schema is at sitemaps.org/schemas/sitemap/0.9/sitemap.xsd. Online validators can check your file against this schema.

3

Verify URLs return 200 status codes

Every URL in your sitemap should return a 200 OK response. URLs returning 301, 404, or 5xx errors should be removed.

4

Check in Google Search Console

After submitting your sitemap, Google Search Console shows parsing errors, URL counts, and indexing status. This is the definitive check.

5

Confirm file size and URL count

Verify you are within the 50,000 URL and 50 MB limits. For sitemap index files, check each referenced sitemap individually.

Common XML Sitemap Mistakes

Including non-canonical URLs. If a page has a canonical tag pointing elsewhere, the non-canonical version should not be in your sitemap. This sends conflicting signals to search engines.

Listing URLs blocked by robots.txt. If robots.txt disallows a URL, including it in your sitemap creates a contradiction. Crawlers cannot access the page, but your sitemap says it is important.

Stale lastmod dates. Setting lastmod to the current date on every build, or never updating it at all, makes the tag useless. It should reflect the actual last content modification.

Missing protocol in URLs. Every <loc> value must include https:// (or http://). Relative paths are not valid.

Wrong content type. The sitemap must be served with Content-Type: application/xml or text/xml. Serving it as text/html can cause parsing failures.

Mixing HTTP and HTTPS. If your site uses HTTPS, all URLs in the sitemap must use HTTPS. Do not list HTTP versions.

A well-structured XML sitemap is one of the lowest-effort, highest-impact technical SEO tasks you can do -- but only if the URLs are accurate and the format is valid.

Automate Sitemap Monitoring

Site Watcher watches your sitemaps, SSL certificates, DNS records, and uptime around the clock. Free for 3 targets. $39/mo for unlimited monitoring.