Fixing Duplicate Content in Drupal
The Global Redirect module can painlessly eliminate duplicate content on your Drupal website
A while back, Ben Finklea wrote about how easy it is to accidentally create duplicate content, and how to fix it. Though it has been some time since the post went up, it remains popular and useful advice— so we wanted to share it with you again.
Duplicate content isn’t just annoying and inconvenient— it’s bad for your SEO, and it’s very easy to accidentally create duplicate content on a Drupal site. But what is duplicate content? According to Google, duplicate content is “substantive blocks of content within or across domains that either completely match other content or are appreciably similar.”
(In layman’s terms, duplicate content is content on your website that’s really, really similar, if not identical, to other content on your website or even elsewhere on the web.)
Some duplicate content is malicious: spammers stealing your website’s content and posting it as their own is malicious duplicate content, for example. However, the most duplicate content that we find (by far) is non-malicious duplicate content, such as:
- Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices.
- Store items shown or linked via multiple distinct URLs
- Printer-only versions of web pages
Unfortunately, when this happens, it can hurt your rankings: Google can’t figure out which duplicate content is the original source, and your content winds up competing against itself. For websites that have multiple pages with very similar content, there are certainly ways to indicate your preferred URL to Google— and for Drupal websites that have duplicate content, one of the best ways to fix the problem is by installing and using the Global Redirect module.
The Global Redirect Module
The Global Redirect module will take care of some housekeeping issues that come up when clean URLs are enabled in Drupal. In short, it’ll eliminate some of the duplicate content issues that you may not have known you had.
Here’s an example from Ben.
"Let’s say, for example, that you create a new website and create the first node that you call the About Us page. Later, because you want the front page of your site to be the content of that node, you go into site settings and make node/1 the front page of the site. Sounds pretty harmless, right? Well, right at this moment, all of these URLs on your site would show the exact same content:
The search engines will think that you have six pages of the exact same content. That's never good. Global Redirect fixes that by redirecting all the URLs you don't want to the one URL that you do."
The Global Redirect does a few neat tricks to make this happen. According to the module’s homepage on Drupal.org, it:
- Checks the current URL for an alias and does a 301 redirect to it if it is not being used.
- Checks the current URL for a trailing slash, removes it if present, and repeats check 1 with the new request.
- Checks if the current URL is the same as the site_frontpage and redirects to the frontpage if there is a match.
- Checks if the Clean URLs feature is enabled and then checks the current URL is being accessed using the clean method rather than the unclean method.
- Checks access to the URL. If the user does not have access to the path, then no redirects are done. This helps avoid exposing private aliased nodes.
- Make sure the case of the URL being accessed is the same as the one set by the author/administrator. For example, if you set the alias "articles/cake-making" to node/123, then the user can access the alias with any combination of case.
For those who want to install it, but aren’t that technical, it’s pretty easy: download the module from https://www.drupal.org/project/globalredirect and configure by navigating to http://www.yourDrupalsite.com/settings/globalredirect or clicking on Admin | Site configuration | Global Redirect
Some of your options will include the following:
- Deslash: Set to On. If enabled, this option will remove the trailing slash from requests. If you require certain requests to have a trailing slash, this feature can cause problems and so may need to be disabled— otherwise, leave it on.
- Non-clean to Clean: Set to On. If enabled, this option will redirect from Non-clean to Clean URL (if Clean URL's are enabled). This will stop, for example, node 1 existing on both yourDrupalsite.com/node/1 and yourDrupalsite. com?q=node/1.
- Remove Trailing Zero Argument: Set to Disabled. If enabled, any instance of /0 will be trimmed from the right of the URL. This stops duplicate pages such as taxonomy/ term/1 and taxonomy/term/1/0 where 0 is the default depth. There is an option of limiting this feature to taxonomy term pages only or allowing it to affect any page. By default this feature is disabled to avoid any unexpected behavior.
- Menu Access Checking: Set to Disabled. If enabled, the module will check the user has access to the page before redirecting. This helps to stop redirection on protected pages and avoids giving away secret URL's. By default this feature is disabled to avoid any unexpected behavior.
- Case Sensitive URL Checking: Set to Enabled. If enabled, the module will compare the current URL to the alias stored in the system. If there are any differences in case then the user will be redirected to the correct URL. Click Save configuration. Now your site is protected from duplicate content.
Best of luck to you in eliminating all of the duplicate content from your website. If you have problems with duplicate content, installing the Global Redirect module should help you rank higher in the SERPs relatively quickly; in the meantime, make sure you’re following SEO best practices. For advice or help getting your website sorted out, feel free to contact us. We’re always happy to lend a hand!