As a practicing SEO consultant I heard one doubt from the webmasters as why should they insert the robots.txt file on the root directory of their site. Their point is they don't need to restrict spiders from any part of their website and why should they insert a file which gives full permission to spiders. What they said is true in developers point of view as we cannot tell or force Search Engine spiders to crawl my website and index it fully. If you really want to do that you need to go for sitemap submission in Google webmaster tools , but that is a different story. Then why should they upload one text file which tells the spiders "you are allowed to access my website" where it is their normal duty?
The answer to the question is very simple. When spiders look for a particular page on your website where that is not available the normal result is error 404. Unfortunately robots.txt file is a well known name for Search Engine spiders and they will will look in to the file to check if any barrier is set on the site for them. If there is no robots.txt file created it will end to an error 404 page. The error will appear to spiders and they may report it as broken link. This broken link report may reduce the importance of your website in Search Engine's view. So to avoid this situation seo consultants advice their clients to upload this simple text file on to their server.
So what is this robots file? The robots.txt is a text file which would be uploaded to the root directory of your website where it contains a set of rules for the Search Engine spiders. Robots.txt is mainly used to tell the web spiders to don't crawl the following (given) links. One thing we do mind that robots.txt files cannot tell a spider to crawl and index the following page as indexing is the normal duty of a spider. I think you got the point. So no one can force the spiders to crawl their website as it is purely depends upon spiders. But one can block spiders from accessing certain part or even full of his website.
Writing robots file is an easy task. Any one can write his own robots file for their website. To write robots file we need to know some basic formats. Here I am giving four formats or examples for writing robots. They are:
1. Block spiders from crawling your entire website
To disallow a spider from crawling your website, the format should be.
User-agent: *
Disallow: /
2. Giving access to your website
To make it reverse we should change it to either
User-agent: *
Disallow:
Or
User-agent: *
allow: /
Please note that allowing a spider to your website doesn't make any sense other than avoiding 404 error if spiders look for robots.txt file on your website.
3. Block spiders from accessing certain files on your site
To block spiders from accessing certain files from your website create a robots.txt file like below.
User-agent: *
Disallow: /cgi-bin/
Disallow: /wusage/
Disallow: /textures/
4. Block certain spiders from accessing your website
To block certain spider from accessing your website we need to write the robots.txt as:
User-agent: " spider name"
Disallow: /
Eg:
User-agent: Googlebot-Image
Disallow: /
I hope this article is enough for the readers to get a basic idea about robots.txt file
No comments:
Post a Comment