Using a robots.txt file

Using a robots.txt file?

Update! Theres a PLUGIN FOR THAT!

Using a robots.txt file can impact on your web site’s traffic and ranking in the search engines. Therefore, you’ll want to remove unnecessary content in WordPress that creates duplicate content. By optimizing your WordPress robots.txt file, it will help to prevent Google from penalizing you for duplicate content.

When the search engines visit your site, they check for a robots.txt file. The robots.txt file is used to give the search engines “robots” instructions to follow, when they crawl your site. It is used to prevent the search engines from reaching your content from more than one location. This includes locations such as in your monthly archives, your category folders, your xml feed and on your front page.

Creating the ultimate WordPress robots.txt file is a lot simpler than you think. The Robots.txt WordPress Plugin – by Peter Coughlin, you can create and edit your robots.txt file from within WordPress. The official download page is at http://wordpress.org/extend/plugins/pc-robotstxt/

This WordPress plugin is for people who want the ultimate SEO out of their WordPress blog. You simply install the plugin and activate it. The plugin will automatically create a virtual robots.txt file for your WordPress blog. Out of the box, this plugin offers the ultimate WordPress robots.txt file for SEO optimization without being an expert!

If you get into trouble with your WordPress robots.txt file up, you can simply deactivate and reactivate the plugin, it will go back to the default settings.
However, before you start, I strongly advise that you make a backup of your current robots.txt file if you have one. This plugin will write over any existing robots.txt file that you have already in place!

Example of finished robots txt file using a quick trick!

# robots.txt for http://www.yourdomainname.com/

# Begin sitemap
sitemap: http://www.yourdomainname.com/sitemap.xml
# End sitemap

# PARTIAL access (Googlebot)
User-agent: Googlebot
Disallow: /*?
Disallow: /*.cgi$
Disallow: /*.css$
Disallow: /*.inc$
Disallow: /*.js$
Disallow: /*.gz$
Disallow: /*.php$
Disallow: /*rurl=*
Disallow: /*.txt$
Disallow: /*/trackback/$
Disallow: /*.wmv$
Disallow: /*.xhtml$

User-agent: Googlebot-Image
Disallow: /wp-includes/

User-agent: Mediapartners-Google
Disallow: /

# digg mirror
User-agent: duggmirror
Disallow: /

# ia_archiver
User-agent: ia_archiver
Disallow: /

# PARTIAL access (All Spiders)
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/

Allow: /sitemap.xml.gz$
Allow: /wp-content/uploads/

Additional thoughts:

The Robots.txt WordPress Plugin – is a great plugin for beginners and advanced WordPress users. Once you get your file in order, I strongly suggest that you copy and paste it into a text document for a backup.

Hope this information was helpful to you

Continued… ( UPDATE )

Robots.txt WordPress Plugin

The solution is to add the names of bad bots to your robots.txt file and disallow them from going anywhere, and add the names of common search engine spiders and specify which locations or files they are allowed to visit.

This plugin will do that in a completely hands free way by setting up a virtual robots.txt file for your blog as soon as it’s activated. Whenever a request for a robots.txt file comes in, WordPress will display the contents of your virtual robots.txt file. No physical file is created on your site but one is shown to the search engine bot.

By default, your virtual robots.txt file will have Google’s Mediabot allowed, a bunch of spam-bots disallowed, and a few of the standard WordPress folders and files disallowed. The default collection of bad bots is borrowed from http://www.clickability.co.uk/robotstxt.html.

Ok, even though it’s completely automated and hands free I admit there are times when I want to tweak what’s contained in the virtual robots.txt file. There’s now a handy options page which lets you edit the contents.

Oh yeah, and if you mess up your robots.txt file you can just deactivate and reactivate the plugin and it will revert back to the default list of rules.

Also, if the plugin detects an existing sitemap.xml file (or if you are using my XML Sitemap plugin) it will add a reference to your sitemap.xml to the end of the robots.txt file. I’m told this helps with the discovery of your sitemap.xml and indexing of your pages. That’s got to be a good idea.

How to use the plugin

With the plugin now being hosted on WordPress, the easiest way to install this baby is to visit your blog admin pages, click the Plugins menu, and then click the Add New menu. In the search box type something like “robots.txt” and with a bit of luck you should see PC Robots.txt in the list that appears. To the right of it you’ll see a link to Install the plugin. Click that.

If you happen to be using the version that was hosted on this site, please delete it and install a new version using the instruction above. That way you’ll always have the latest version and you’ll get notified of updates and such by WordPress.

The official download page is at http://wordpress.org/extend/plugins/pc-robotstxt/

Leave a Reply

Your email address will not be published. Required fields are marked *