Crawl errors are the bane of every digital marketer-- they seemingly pop up over night and their numbers grow exponentially. Luckily for Drupal marketers there a number of techniques that you can employ to minimize the number of crawl errors that occur and fix the newly created crawl errors on your website. However, before we get started, let's first review a few common crawl errors that you're likely to run into.
Page Not Found - Hard 404 Errors
The hard 404 error is usually the most common 404 error that you'll find when you're reviewing your crawl errors. These types of crawl errors generally occur when a previously published piece of content is deleted or the content is moved to another location without creating a search engine friendly redirect (301 redirect).
Page Not Found - Soft 404 Errors
Soft 404 errors are not quite as common as a hard 404. A soft 404 error occurs when a piece of content is published but has very little content on it or duplicate content. When the page returns a 300 code, it indicates that the page is accessible, but because there is so little content on the page it will not be indexed as it will be classified as a soft 404.
Access Denied - 403 Errors
This is one of the most frequent errors that I run into with Drupal websites. 403 errors typically occur when a previously published page is then unpublished.
Internal Server Errors - 500 Errors
Internal server errors, also known as 500 errors, occur when an unexpected error occurs and the source of the error cannot be identified.
How to Fix Crawl Errors Using Drupal
The best way to stop those pesky crawl errors from occurring is by configuring your Drupal website to defend against them. To do so, we recommend installing and configuring the following modules.
- The first thing you'll want to setup is the Pathauto module, which "provides a mechanism for modules to automatically generate aliases for the content they manage." In other words, you can configure Pathauto so that when you publish content, the content's address won't look like "/node/231." Instead, it will use a logical, human and search engine friendly syntax based on the patterns that you set.
- Perhaps the most important module you'll install is the Redirect module. The Redirect module will automatically create a redirect when you change the path of the content. You can also use the Redirect module to manually fix the crawl errors that have occurred in the case that they were not automatically fixed.
- Next stop is the Global Redirect module. The Global Redirect module "searches for an alias of the current URL and 301 redirects if found. Stops duplicate content arising when path module is enabled." While we're on the subject, the Path module, which is integrated into Drupal's core as long as you're running Drupal 4.3 or higher, allows you to rename URL paths. No need to download this module for most of us.
- Finally, the Search 404 module is another important module for preventing 404 errors. If a visitor lands on a page that generates a 404 error, the Search 404 module redirects that visitor to a new page with a internal site search related to the page. The hope with this module is that the visitor finds his piece of content or something similar within the search results.
How to Monitor Crawl Errors
While these modules will help to prevent crawl errors from occurring, they are not a guarantee against crawl errors. We recommend regularly monitoring your crawl errors using the following techniques.
At least once a month, check Drupal's built in reports for the "Top 'page not found' errors" and "Top 'access denied' errors". To find those reports, use the following URLs:
The Link Checker module extends the functionality of Drupal's built in reports. This module provides you with a report of all of the crawl errors that are on your website. The Link Checker module has a highly customizable interface if you only care about crawl errors on particular content types.
Although not specific to Drupal, Google Webmaster Tools is one of the easiest ways of finding your 404, 403, 500 and other errors. In addition to finding your crawl errors, you should submit your XML sitemaps to Google using this tool to increase the number of pages indexed and identify possible indexing issues. Google Webmaster Tools also gives users the ability to run an on demand crawl of websites to identify hidden issues. The Site Verify module can help with the installation of Google Webmaster Tools.
One final tool for monitoring errors is the Screaming Frog SEO Spider. The SEO spider attempts to duplicate the functionality of Google's search engine spiders. You can run a crawl of the site and it will often identify many hidden crawl errors and issues.
There are a lot of moving parts that maintain your website's search engine optimization-- which is why I always recommend using a calendar to keep yourself organized. Setup daily, weekly, monthly, quarterly, and annual reminders so that your can address crawl errors and make sure your website stays at the top of the search engine rankings. Tell us, what techniques do you use to prevent crawl errors?