Sitemaps help search engines discover URLs. They don’t guarantee indexing — Google still decides what to crawl and index — but for large sites, new pages, or sites with weak internal linking, they matter. Here’s how to structure them correctly.
What Sitemaps Do (and Don’t Do)
- Discovery — Tell Google which URLs exist. Especially useful when pages aren’t well linked.
- Priority signal — lastmod (last modified) helps Google prioritize fresh content.
- No guarantee — Inclusion in a sitemap does not mean indexing. Quality and crawl budget still apply.
What to Include
All Important Pages
- Services — Every service page.
- Products — Every product (or category if products are infinite).
- Blog — Every post you want indexed.
- Industry/landing pages — Key commercial pages.
- Contact, about — Core site pages.
Canonical URLs Only
One URL per page. No duplicates. No parameter variants. No www and non-www. Pick one canonical and use it consistently.
lastmod (Optional but Helpful)
<lastmod>2025-03-10</lastmod>
When the page was last updated. Google may use this to prioritize recrawling. Keep it accurate — stale lastmod can hurt.
What to Exclude
Noindex Pages
Don’t include pages you’ve told Google not to index. Sitemap + noindex sends mixed signals.
Duplicate Content
Only the canonical URL. No:
- Parameter variants (?sort=, ?filter=)
- Session IDs
- Print or PDF versions (unless they’re the canonical)
Thank-You Pages
No value for search. Exclude.
Pagination (Usually)
- Page 2, 3, 4 — Often exclude. Or include only if each has unique value.
- Infinite scroll — Don’t create URLs for every “load more.” Sitemap isn’t for that.
Staging or Dev
Never. Only production URLs.
Sitemap Structure
Small Sites (Under 50 URLs)
Single sitemap: sitemap.xml. Include all important pages.
Large Sites (50+ URLs)
Split by section:
sitemap-pages.xml— Services, about, contact.sitemap-blog.xml— Blog posts.sitemap-products.xml— Products (if applicable).
Create sitemap-index.xml that references each:
<sitemapindex>
<sitemap>
<loc>https://yoursite.com/sitemap-pages.xml</loc>
<lastmod>2025-03-10</lastmod>
</sitemap>
<sitemap>
<loc>https://yoursite.com/sitemap-blog.xml</loc>
<lastmod>2025-03-10</lastmod>
</sitemap>
</sitemapindex>
50,000 URL Limit
Per sitemap. If you exceed, split further (e.g., sitemap-blog-1.xml, sitemap-blog-2.xml).
Submitting Your Sitemap
Google Search Console
- Sitemaps section.
- Enter sitemap URL:
https://yoursite.com/sitemap-index.xml(or sitemap.xml for small sites). - Submit.
Google will crawl and process. Check back for errors (invalid URLs, redirects, etc.).
robots.txt
Add a line at the end:
Sitemap: https://yoursite.com/sitemap-index.xml
Backup discovery. Some crawlers use this. Google uses Search Console primarily, but robots.txt doesn’t hurt.
Common Mistakes
- Including noindex pages — Mixed signals. Exclude.
- Including redirects — URLs that 301 elsewhere. Remove or fix.
- Wrong canonical — Sitemap has www, site uses non-www (or vice versa). Match your canonical.
- Stale lastmod — Every page shows today’s date. Inaccurate. Update when content changes.
- Missing important pages — Key commercial pages not in sitemap. Add them.
B2B Considerations
- Service + industry combos — If you have /services/web-design/manufacturing, include them.
- Case studies — Include. They’re linkable, rankable assets.
- Resources/blog — Include all posts you want indexed. lastmod helps for fresh content.
We configure sitemaps for every project. Start a project and we’ll set up your sitemap structure.
Related articles
Solutions-Based SEO: Content That Answers and Converts
SEO content that solves problems. Answer the question first, then guide to your solution. Snippet-friendly structure and conversion paths.
Read →SEO Content Calendar: Planning and Prioritizing
Content calendar for SEO. Topic clusters, keyword priority, and publishing cadence.
Read →Backlink Strategy for B2B: What Works Without Spam
Earning backlinks for B2B. Content, partnerships, and tactics that don't risk penalties.
Read →