Sitemap Best Practices: 12 Rules for SEO-Friendly XML Sitemaps
Follow these 12 sitemap best practices to maximize crawl efficiency, avoid common mistakes, and keep search engines indexing the right pages.
Last updated: 2026-02-17
Why Sitemap Best Practices Matter
A sitemap is only useful if it is accurate, current, and well-structured. A poorly maintained sitemap can actively hurt your SEO by wasting crawl budget on dead pages, sending conflicting signals about canonical URLs, and eroding search engine trust in your site's technical quality.
These 12 best practices cover everything from URL selection to monitoring. Follow them and your sitemap becomes a reliable asset. Ignore them and it becomes noise.
1. Include Only Canonical, Indexable URLs
This is the most important rule. Every URL in your sitemap should be:
- The canonical version of the page (matching the
rel=canonicaltag) - Returning a 200 status code
- Not blocked by robots.txt
- Not tagged with noindex
If a URL fails any of these checks, it does not belong in your sitemap. Including non-canonical or non-indexable URLs sends contradictory signals -- your sitemap says "index this" while another signal says "do not index this."
Google treats conflicting signals as a site quality issue. If your sitemap consistently includes URLs that are noindexed, redirected, or blocked, Google may reduce its trust in your sitemap data over time.
2. Keep lastmod Accurate
The lastmod tag tells search engines when a page was last meaningfully updated. Used correctly, it helps crawlers prioritize re-crawling pages with fresh content. Used incorrectly, it teaches them to ignore the tag entirely.
Do:
- Update
lastmodonly when the page content actually changes. - Use W3C Datetime format (
YYYY-MM-DDor full ISO 8601). - Automate
lastmodbased on your CMS's "last modified" field.
Do not:
- Set
lastmodto the current date on every build or deploy. - Set all pages to the same
lastmodvalue. - Leave
lastmodstatic forever, even as content changes.
Google engineer John Mueller has stated that inaccurate lastmod values make the tag less useful across your entire sitemap. If Google learns that your lastmod dates are unreliable, it will start ignoring them -- even when they are correct.
3. Use a Sitemap Index for Large Sites
If your site has more than a few thousand URLs, organize your sitemaps using a sitemap index file. Even if you are below the 50,000-URL hard limit, splitting sitemaps by content type has real benefits.
Easier monitoring
When sitemaps are split by type (products, blog, categories), you can quickly identify which section has indexing issues.
Faster processing
Smaller sitemaps are fetched and processed faster by search engines. A 500-URL sitemap is processed in seconds.
Targeted updates
When content in one section changes, only that sitemap's lastmod updates. Search engines can skip re-processing unchanged sitemaps.
Clearer analytics
Google Search Console reports sitemap-level data. Separate sitemaps give you separate metrics for each content type.
A common split strategy for an e-commerce site:
/sitemap.xml (sitemap index)
/sitemap-pages.xml (static pages: home, about, contact)
/sitemap-products.xml (product pages)
/sitemap-categories.xml (category/collection pages)
/sitemap-blog.xml (blog posts)
4. Remove Dead URLs Immediately
When a page is deleted, unpublished, or starts returning a 4xx or 5xx error, remove it from your sitemap right away. Do not wait for the next scheduled regeneration.
Dead URLs in your sitemap:
- Waste crawl budget on pages that no longer exist
- Generate errors in Google Search Console
- Signal poor site maintenance
If your sitemap is dynamically generated, this should happen automatically. If it is static or cached, set up a process to check for and remove dead URLs regularly.
5. Match Your Sitemap to Your Canonical Tags
Every URL in your sitemap should match the canonical URL declared on that page. Mismatches create conflicting signals.
| Scenario | Sitemap URL | Page Canonical | Result |
|---|---|---|---|
| Correct | https://example.com/shoes | https://example.com/shoes | No conflict -- search engines index the page |
| Wrong | https://example.com/shoes?color=red | https://example.com/shoes | Conflict -- sitemap says index variant, canonical says ignore it |
| Wrong | http://example.com/shoes | https://example.com/shoes | Conflict -- protocol mismatch |
Audit your sitemap against your canonical tags periodically. Any mismatch should be resolved in favor of the canonical URL.
6. Use HTTPS URLs Consistently
If your site uses HTTPS (it should), every URL in your sitemap must use the https:// protocol. Do not mix HTTP and HTTPS URLs.
Also ensure consistency with:
- www vs. non-www -- Pick one and use it everywhere.
- Trailing slashes -- Decide whether URLs end with
/or not, and be consistent. - Lowercase -- URLs are case-sensitive. Use lowercase consistently to avoid duplicates.
Catch Sitemap Inconsistencies Automatically
Site Watcher monitors your sitemap for protocol mismatches, dead URLs, and canonical conflicts. Get alerts before they affect your rankings.
7. Exclude noindex Pages
If a page has a <meta name="robots" content="noindex"> tag or an X-Robots-Tag: noindex header, it must not appear in your sitemap.
Including noindex pages in your sitemap is one of the most common mistakes. It happens frequently when:
- A CMS generates the sitemap from all pages regardless of their index status.
- A developer adds noindex to a page but forgets to exclude it from the sitemap.
- A staging environment's noindex tags accidentally persist in production.
Set up automated checks to ensure noindex pages are never included in your sitemap output.
8. Declare Your Sitemap in robots.txt
Always include a Sitemap: directive in your robots.txt file pointing to your sitemap (or sitemap index).
Sitemap: https://example.com/sitemap.xml
This ensures every crawler that reads your robots.txt -- not just Google -- can discover your sitemap. It also serves as documentation: anyone reviewing your technical SEO can immediately find your sitemap location.
9. Compress Large Sitemaps
If your sitemap files are large, serve them with gzip compression. Change the extension to .xml.gz and update all references accordingly.
Benefits:
- Reduces bandwidth for both your server and search engine crawlers.
- Faster transfer times, especially for sitemaps with tens of thousands of URLs.
- No change in functionality -- all major search engines support gzipped sitemaps.
The 50 MB size limit applies to the uncompressed file size. A gzipped sitemap can be much smaller on the wire, but the decompressed content still cannot exceed 50 MB.
10. Consider Image, Video, and News Sitemaps
If your site contains significant media or news content, specialized sitemaps help search engines discover and index that content.
Image sitemaps extend the standard format with <image:image> tags that include the image URL, caption, title, and license information. This is especially useful for images loaded via JavaScript or CSS that crawlers might miss.
Video sitemaps add <video:video> tags with title, description, thumbnail URL, duration, and content location. Google uses this data for video search results and rich snippets.
News sitemaps use <news:news> tags with publication name, language, title, and publication date. These are required for Google News inclusion and must only contain articles published within the last 48 hours.
11. Avoid Common Mistakes
Beyond the specific rules above, these general mistakes cause frequent problems:
Setting all priorities to 1.0. If every page is "highest priority," none of them are. Priority is relative. More importantly, Google ignores the priority tag entirely, so do not spend time optimizing it.
Including URL parameters. URLs like /products?sort=price&page=2 usually should not be indexed. Use canonical tags to point parameter URLs to their base version, and only include the base version in your sitemap.
Forgetting trailing slash consistency. /about and /about/ are technically different URLs. If your site uses one format, make sure your sitemap matches.
Submitting but never checking. Submitting a sitemap to Google Search Console is not the finish line. Review the Sitemaps report and Pages report regularly to catch issues.
12. Monitor Your Sitemap Continuously
A sitemap is a living document. It needs ongoing monitoring to stay healthy.
Set up automated checks
Monitor your sitemap URL for availability. If it returns a 500 error or goes offline, search engines cannot access it. Uptime monitoring should cover your sitemap just like any other critical page.
Track URL count over time
A sudden change in the number of URLs in your sitemap indicates something changed -- a deployment, a content migration, or a bug. Track this metric to catch unexpected changes early.
Review GSC reports weekly
Check the Sitemaps report for processing errors and the Pages report for changes in index coverage. Look for trends, not just absolute numbers.
Audit sitemap URLs quarterly
Run a full audit comparing your sitemap URLs against your live site. Check for dead links, missing pages, non-canonical URLs, and noindex pages that should not be listed.
Validate after deployments
Every significant deployment should include a sitemap validation check. Build it into your CI/CD pipeline if possible.
The most common sitemap problems are not formatting errors -- they are content drift. Pages get deleted, URLs change, noindex tags get added, and the sitemap does not update. Continuous monitoring catches this drift before it compounds.
The Sitemap Best Practices Checklist
A quick reference for auditing your sitemap:
| Check | Status Criteria |
|---|---|
| All URLs return 200 | No 3xx, 4xx, or 5xx responses |
| All URLs are canonical | Sitemap URL matches page canonical tag |
| No noindex pages | No URL has a noindex directive |
| lastmod is accurate | Reflects actual content modification dates |
| HTTPS used consistently | No mixed protocols or www inconsistencies |
| Under size limits | Under 50K URLs and 50 MB per file |
| Referenced in robots.txt | Sitemap: directive present |
| Submitted to GSC | Status shows Success in Search Console |
A sitemap that follows these best practices is not just a file -- it is an active, reliable channel between your site and search engines.
Automate Sitemap Best Practices
Site Watcher monitors your sitemap for dead URLs, format errors, and indexing issues alongside SSL, DNS, and uptime monitoring. Free for 3 targets. $39/mo for unlimited.