Posted to Ben Finklea's blog on April 23rd, 2009

The Duplicate Content Myth: How To Be A Proactive Webmaster

Duplicate content can get you in a heap of trouble. Once in grade school, I repeated, out loud, what my friend whispered in my ear. The next thing I knew, I was in the principal’s office and my mother was “leaving work early”. Having duplicate content isn’t going to bring down the wrath of your mother or her wooden spoon, but it will land you in Google’s office if you are doing it deliberately.

Duplicating content can also affect your site in a variety of ways, like hurting it’s bandwidth. Duplicated content can lead to inefficient crawling by the spiders.When they discover ten URLs on your site, it has to crawl each of those URLs before it knows what is the same content or not. The more time and resources spent crawling duplicate content across multiple URLs, the less time it has to get to the rest of your other content. Thus, slowing down your bandwidth and possibly upsetting the Google gods.

If you are innocently duplicated some content, you typically don’t need to submit a reconsideration request where you are doing clean up. If you a beginner-to-intermediate webmaster, you most likely won’t have to invest too much energy into policing your own content. Most search engines have ways of handling duplicate content, without weaving you in a web of penalties.


Rather than falling victim to a scraper of your beautifully unique content, or slowing down your bandwidth due to excessive crawling, you can be proactive and address the issues before they undress you.

Use 301s: If you just restructured your site, use 301 redirects in your .htaccess file to redirect users and the spiders.

Use the preferred domain feature of webmaster tools: If you have other sites linking to your using both the www and the non-www version of your URLs, you can let the spiders know which way you prefer your site to be indexed.

Be consistent: Try to keep your internal links consistent. Don’t link to /clownshoes and /clownshoes/ and /clownshoes/index.htm.

Block appropriately: Rather than allowing Google’s algorithms pick their favorite version, you may want to specify which is your preferred version. You don’t want the spiders to index printer versions of articles or make use of regular expressions in your robots.txt file.

Avoid publishing stubs: Internet users don’t like seeing “empty” or “coming soon” pages, so avoid placeholders, where possible.

Use TLDs: When handling country-specific content, use top level domains such as .de to indicate Germany-focused content, rather than /de or

Syndicate carefully: When syndicating, make sure there is a link back to the original article for each syndicated article. Even by doing that the search results will still show the unblocked version that the engines believe is most appropriate, whether its your preferred version or not.

Understand your CMS: Make absolutely sure that you know how content is displayed on your web site. Especially if it includes a blog, forum, or related system that often shows the same content in multiple forms.

If you keep these tips in mind and be a proactive webmaster, you shouldn’t have anything to worry about. Remember, most internet users enjoy the diverse selection of unique content while browsing to their heart’s content. Help your fellow webmasters and SEO-friendly folks by not passing on the myth of duplicate content. Incurring a ‘penalty’ is entirely within your control, you just need to know what angers the spiders. Just like in any aspect of life, people don’t like to be copied. We despise it so much, we have managed to teach our computers to tell the difference.

Volacci.® Your Profit. Our Passion.