Robots txt
Extracted from the Robots txt.org site provides the folowing explanation on using the The Robots Exclusion Protocol. Link provided by BruceM on User2 list 6/30/2009 10:34 PM
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol. It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /
The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site. There are two important considerations when using /robots.txt:
- robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
- the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
So don't try to use /robots.txt to hide information. Note that malware or email harvesting bots will ignore the directives of the robots.txt file.
Related links
The following provide additional security measures:
Controlling Site Access
- Using tngrobots.php
- Robots txt
- How to setup a robots.txt file
- Bot Trap
- Htaccess
- Htaccess Deny
- Htaccess Rewrite
Protecting Resources
- Permissions Explained
- Database User
- Move your configuration files
- Move your backup files
- Move your gedcom files
- Overlaid Subroot how to recover from subroot.php overlay
- Prevent Directory Listing
- Protecting access log