The “/robots.txt” file is a text file, with one or more records. Usually contains a single record looking like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

In this example, three directories are excluded.

Note that you need a separate “Disallow” line for every URL prefix you want to exclude — you cannot say “Disallow: /cgi-bin/ /tmp/” on a single line. Also, you may not have blank lines in a record, as they are used to delimit multiple records.

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The ‘*’ in the User-agent field is a special value meaning “any robot”. Specifically, you cannot have lines like “User-agent: *bot*”, “Disallow: /tmp/*” or “Disallow: *.gif”.

What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:

To exclude all robots from the entire server

User-agent: *
Disallow: /

To allow all robots complete access

User-agent: *
Disallow:

(or just create an empty “/robots.txt” file, or don’t use one at all)

To exclude all robots from part of the server

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

To exclude a single robot

User-agent: BadBot
Disallow: /

To allow a single robot

User-agent: Google
Disallow:

User-agent: *
Disallow: /

To exclude all files except one

This is currently a bit awkward, as there is no “Allow” field. The easy way is to put all files to be disallowed into a separate directory, say “stuff”, and leave the one file in the level above this directory:

User-agent: *
Disallow: /~joe/stuff/

Alternatively you can explicitly disallow all disallowed pages:

User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

Source:http://www.robotstxt.org

Tags:

Other Interesting Articles:

  • What is robots.txt
  • Website Cookie Testing Part I
  • Encrypt-Stick acts as a personal key to your computer
  • Keylogger
  • Protect removable drives from virus/malware
  • Google Chrome Themes
  • GMail Drive
  • Advertisements Free Internet Browsing
  • Fix File Extension Problems: ParetoLogic FileCure
  • New WordPress post not saved or published?
  • Adding multiple Email accounts to Gmail
  • What is Virtualization
  • Cross Site Scripting (XSS)
  • SiliconIndia:MNCs should keep servers in India: CBI, IB
  • Have you made the most out of GMail Features?
  • What is RSS Part II
  • Google India Server Down
  • Free Data Recovery Software for windows
  • thirdEye Project by Pranav Mistry
  • Backslash Not Displayed in WordPress Post
  • Leave a Reply

    You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

    One Response to “robots.txt Rules”

    1. Useful info, i was looking for this from days. Thanks dude…..

      [Reply]