As smart as Google’s search engine spiders are, even they can miss pages on your site while indexing for search results. Maybe you have moved a link to content so that it’s not easily accessible. Or, it could be possible your site is too big for Google to crawl without pulling all your server’s resources - not pretty!
The solution is simple: a sitemap. There are three main types of sitemaps you can use on your Drupal site, but we will cover the most important: the XML sitemap. XML sitemaps are designed to be used by search engines for indexing your pages. Here is a refined definition according to www.sitemaps.org:
Sitemaps are an easy way for Webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site. Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata.
For everything you possibly need to know about Drupal XML sitemaps, please join me after the jump...
Please Note: Using a sitemap does not guarantee that every page on your site will be included in the search engines. Rather, it helps the search engines find more of your pages. Submitting an XML sitemap to Google will significantly increase the number of pages when you do a site:search. The keyword site: searches show you how many pages of your site are included in the search engine index, as shown in the following screenshot:
The XML Sitemap Module The XML Sitemap module for Drupal creates a sitemap for your site that conforms to the sitemap.org specifications. You can download it from the following link: http://drupal.org/project/xmlsitemap and install it just like any normal Drupal module. When you go to turn on the module, you’ll see a list that looks similar to this:
Step 1: Before you turn on any of the included modules, consider what content on your site you want to appear in the search engines and only turn on the modules you need.
- The XML sitemap is required. Turn it on.
- XML sitemap custom allows you to add your own customized links to the sitemap. I highly recommend turning this one on as well.
- XML sitemaps engine will automatically submit your sitemap to the search engines each time it changes. This module is not necessary and there are better ways to submit your sitemap. However, it does a great job of helping you verify your site with each search engine. Turn this one on.
- XML sitemap menu adds your menu items to the sitemap. This is a good idea. Turn it on.
- XML sitemap node add all your nodes, which are the bulk of your content. Turn it on.
- XML sitemap taxonomy adds all your taxonomy term pages to the sitemap. Generally, this is a good idea but some folks might not want this listed. Term pages are good category pages so I recommend turning this one on as well.
- Don’t forget to Save configuration.
Step 2: Go to your Administer | Configuration | XML Sitemap and you should be able to see a screen like this:
Step 3: Click on Settings and you should see a few options:
- Minimum sitemap lifetime: It determines that minimum amount of time that the module will wait before renewing the sitemap. Use this feature if you have an enormous sitemap that is taking up too much of your server’s resources. Most sites should keep this setting on No minimum.
- Include a stylesheet in the sitemaps will generate a simple CSS file to include with the sitemap that is generated. This is not necessary but the very helpful to the search engines for troubleshooting or if any human eyes view the sitemap. Leave it checked.
- Generate sitemaps for the following languages: In the near future, this option will allow you to specify sitemaps for different languages. This is very important for international sites who want to rank in localized search engines. For now, however, English is the option.
Step 4: Click the Advanced settings drop-down and you should see this:
- Number of links in each sitemap page allows you to specify how many links to pages on your website will be in each sitemap. Unless you are having trouble with search engines accepting your sitemap, leave this on Automatic.
- Maximum number of sitemap links to process at once sets the number of additional links that the module will add to your sitemap each time the cron runs. Leave this setting alone unless you notice that cron is timing out.
- Sitemap cache directory allows you to set where the sitemap data will be stored. This is data not seen by search engines or human visitors; it’s only used by the module.
- Base URL is the base URL of your site and generally should be left as is.
Step 5: Click on the front page drop-down and set the following options:
- Front page priority: 1.0 is the highest setting you can give a page in the XML sitemap. For most websites, the front page is the single most important part of your site. This setting should be left at 1.0.
- Front page change frequency: Tells the search engines how often they should revisit your front page. Adjust this setting to reflect how often the front page of your site changes.
Step 6: Open the Content types drop-down and you should see this:
- You should see each Content type listed separately. You will want to leave these settings alone so that all your content shows up in the sitemap.
- If you do want to adjust these settings, you will need to go the content type screen. Click on the name of the content type to go to its screen.
- On the content type screen, open the XML sitemap drop-down and you’ll get two options:
- Include in sitemap sets the default action for that content type - if you check this box, it will be included in the sitemap.
- Default priority allows you to set the default for each node that you create of that content type. Default is usually .5 but you can adjust it if you want certain pages with a higher or lower priority.
- Click on Save content type. Repeat this process for each content type you want to change.
Step 7: Click Save configuration.
Step 8: Now it’s time to run cron. Cron is a recurring script that takes care of many maintenance issues in Drupal, including populating the XML sitemap. To run cron, go to http://www.yourDrupalsite.com/cron.php and wait until the page is finished loading. You will receive no indication that it’s complete except that the page will stop loading.
Step 9: Go to http://www.yourDrupalsite.com/sitemap.xml. If you see something like this:
or a screen similar to this:
Then you’ve done it right! Congrats! Another round of espresso shots to all! Keep in mind that the XML sitemap will only update when cron runs. On a normal Drupal installation, you should set cron to run periodically – nightly for most sites or more often for high-traffic sites.
Thanks For Reading!
Did you find this post entertaining, useful, or interesting? Please repost, retweet, or redistribute to any of the social sites of your choice, and please subscribe to our RSS feed for daily fodder. For every RSS subscription Volacci gets, a kitten earns its whiskers.
You like kittens, don’t you? Do the right thing, then. Subscribe. We also are very interested in what you have to say in response to this blog post. As always we are very grateful for you, our reader, and greatly value your input. Please start a conversation with a comment below.