Happy Birthday Robots.txt

Barry

Today is the 20th birthday of the robots.txt directive being available for webmasters to block search engines from crawling their pages. It was created by Martijn Koster in 1994.

What is robots.txt?

JOIN OUR NEWSLETTER
Not Every One Focuses On Your Requirements! Get What You Want- Revenue & Ranking Both. Make Money While Stepping Up The Ladder Of SERPs.
We hate spam. Your email address will not be sold or shared with anyone else.

Robots.txt is a text but not html file we put on the site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that we put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sensitive data, it is too naaive to rely on robots.txt to protect it from being indexed and displayed in search results.

The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory and if they don’t find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way.

Structure of a Robots.txt File
The structure of a robots.txt is pretty simple – it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:

User-agent:
Disallow:

It does not disallow any file from crawling

or

User-agent:
Disallow: /

It disallow the whole website from crawling.
“User-agent” are search engines’ crawlers and disallow: lists the files and directories to be excluded from indexing. In addition to “user-agent:” and “disallow:” entries, you can include comment lines – just put the # sign at the beginning of the line:
# All user agents are disallowed to see the /temp directory.
User-agent: *
Disallow: /temp/

 

For the last 20 years this simple txt file is the guiding path which directs a search engine to crawl or not to crawl a file / website.

mm
Barry Davis is a Technology Evangelist who is joined to Webskitters for more than 5 years. A specialist in Website design, development & planning online business strategy. He is passionate about implementing new web technologies that makes websites perform better.

Facebooktwittergoogle_pluspinterestlinkedin

Interested in working with us?

We'd love to hear from you
Webskitters LLC
7950 NW 53rd St #337 Miami, Florida 33166
Phone: 732.218.7686