How to Fix the Problems with Drupal’s Default Robots.txt File

No one is perfect. And neither is Drupal’s default robots.txt file. In fact, there are several problems with the file. If you test out your default robots.txt file line by line using Google Webmaster Tools’ robots.txt testing utility, you will find that a lot of paths which look like they are being blocked will actually be crawled.
The reason is that Drupal does not require the trailing slash ( / ) after the path to show you the content. Because of the way robots.txt files are parsed, Googlebot will avoid the page with the slash but crawl the page without the slash.
For example: / admin / is listed as disallowed. As you would expect, the testing utility shows that http://www.yourDrupalsite.com/admin/ is disallowed. Not so fast. Put in http://www.yourDrupalsite.com/admin (without the slash) and you’ll see that is it allowed. “It’s a trap!” Not really, but fortunately it is relatively easy to fix.
Do you want to know how to fix the problems with Drupal’s default robots.txt file in eight easy steps? Please read on.
What in Tarnation is a Googlebot?

Huh? Google what?! Googlebot! Google and other search engines use server systems–commonly referred to as spiders, crawlers, or robots–to travel the expanse of the Internet and find each and every website. Google’s system is also referred to as Googlebot to distinguish it from all the other search engine robots. While this is not a reported number, it is estimated that the Googlebot crawls 10 billion websites each week! I’d like to see it race R2D2!
Fixing the Drupal Robots.txt File

Like I said earlier, fixing Drupal’s default robots.txt file is relatively easy. Carry out the following steps in order to fix the file:
1. Make a backup of the robots.txt file.
2. Open the robots.txt file for editing. If necessary, download the file and open it in a local text editor.
3. Find the Paths (clean URLs) section and the Paths (no clean URLs) section. Note that both sections appear whether you've turned on clean URLs or not. Drupal covers you either way. They look like this:
# Paths (clean URLs)
Disallow: /admin/
Disallow: /comment/reply/
Disallow: /contact/ Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
4. Duplicate the two sections (simply copy and paste them) so that you have four sections—two of the # Paths (clean URLs) sections and two of # Paths (no clean URLs) sections.
5. Add 'fixed!' to the comment of the new sections so that you can tell them apart.
6. Delete the trailing / after each Disallow line in the fixed! sections. You should end up with four sections that look like this:
# Paths (clean URLs)
Disallow: /admin/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
# Paths (clean URLs) – fixed!
Disallow: /admin
Disallow: /comment/reply
Disallow: /contact
Disallow: /logout
Disallow: /node/add
Disallow: /search
Disallow: /user/register
Disallow: /user/password
Disallow: /user/login
# Paths (no clean URLs) – fixed!
Disallow: /?q=admin
Disallow: /?q=comment/reply
Disallow: /?q=contact
Disallow: /?q=logout
Disallow: /?q=node/add
Disallow: /?q=search
Disallow: /?q=user/password
Disallow: /?q=user/register
Disallow: /?q=user/login
7. Save your robots.txt file, uploading it if necessary, replacing the existing file (you backed it up, didn't you?).
8. Go to http://www.yourDrupalsite.com/robots.txt and double-check that your changes are in effect. You may need to do a refresh on your browser to see the changes.
Now your robots.txt file is working as you would expect it to.
Additional Changes You Can Make for SEO

Now that you have fixed your default robots.txt file, there are a few additional changes you can make. Using directives and pattern matching commands, the robots.txt file can exclude entire sections of the site from the crawlers like the admin pages, certain individual files like cron.php, and some directories like /scripts and /modules.
In many cases, though, you should tweak your robots.txt file for optimal SEO results. Here are several changes you can make to the file to meet your needs in certain situations:
• You are developing a new site and you don’t want it to show up in any search engine until you’re ready to launch it. Add Disallow: * just after the User-agent:
• The server you are running is very slow and you don’t want the crawlers to slow your site down your site for visitors. Adjust the Crawl-delay by changing it from 10 to 20.
• If you're on a super-fast server (and you should be, right?) you can tell the bots to bring it on! Change the Crawl-delay to 5 or even 1 second. Monitor your server closely for a few days to make sure it can handle the extra load.
• You're running a site which allows people to upload their own images but you don't necessarily want those images to show up in Google. Add these lines at the bottom of your robots.txt file:
User-agent: Googlebot-Image
Disallow: /*.jpg$
Disallow: /*.gif$
Disallow: /*.png$
If all of the files were in the /files/users/images/ directory, you could do this:
User-agent: Googlebot-Image
Disallow: /files/users/images/
• Say you noticed in your server logs that there was a bad robot out there that was scraping all your content. You can try to prevent this by adding this to the bottom of your robots.txt file:
User-agent: Bad-Robot Disallow: *
• If you have installed the XML Sitemap module, then you've got a great tool that you should send out to all of the search engines. However, it's tedious to go to each engine's site and upload your URL. Instead, you can add a couple of simple lines to the robots.txt file.
For more information on robots.txt and Drupal SEO, check out my book: Drupal 6 Search Engine Optimization.
Thank You For Reading!

No one likes people who don’t share, especially giant flying cats. So if you liked what you read, please share my post with any of our socially-labeled buttons, or we’ll sick Fluffy after you! Please subscribe to our RSS feed as well so you can receive daily fodder from our blog.
Working with Volacci on our Search Engine Marketing was an absolute pleasure. They delivered results within an aggressive time frame and budget, and I wouldn’t think twice about working with them again, or recommending them to my colleagues.
Weekly Blog entry archives
- Week of February 26, 2012 (2)
- Week of February 19, 2012 (1)
- Week of February 12, 2012 (1)
- Week of February 5, 2012 (1)
- Week of January 29, 2012 (2)
- Week of January 22, 2012 (1)
- Week of January 15, 2012 (3)
- Week of January 8, 2012 (1)
- Week of December 25, 2011 (1)
- Week of December 11, 2011 (1)
- Week of December 4, 2011 (2)
- Week of November 27, 2011 (3)
- Week of November 13, 2011 (1)
- Week of November 6, 2011 (2)
- Week of October 30, 2011 (3)
- Week of October 23, 2011 (1)
- Week of October 16, 2011 (1)
- Week of October 9, 2011 (1)
- Week of October 2, 2011 (2)
- Week of September 25, 2011 (2)
- Week of September 4, 2011 (1)
- Week of August 28, 2011 (1)
- Week of August 21, 2011 (1)
- Week of August 14, 2011 (1)
- Week of July 31, 2011 (1)
- Week of July 24, 2011 (1)
About the author

Ben Finklea
Ben and the Volacci® team provide Search Engine Optimization, Paid Search, and Conversions Consulting to a varied client base - ranging from local real estate agents to Fortune 500 companies. Ben's book Drupal 6 Search Engine Optimization was released in September, 2009 and is available from Amazon.com.

Comments
I just wanted to thank you
I just wanted to thank you for posting this message about this and I hope it helps some of the others on the board as much as its helped me. Many thanks for the help!!!sexleketøybuy adult traffic
Flying cat picture
Hi Ben, I'm reading through your book. A couple of weeks ago while my wife was at class I got to the Google Reader section, checked out your blog and about fell out of my chair laughing over the little girl picture.
I showed my wife when she got home; she almost fell off the couch and promptly asked me for the link so she could post it to her friends on FaceBook.
Most likely a bunch of Okies are now checking out your blog, or talking about the picture at Starbucks.
Best regards,
Dave Brooks