Zeile 1: |
Zeile 1: |
− | The [http://www.robotstxt.org/robotstxt.html Robots txt.org site] provides explanation on using the The Robots Exclusion Protocol.
| + | Extracted from the [http://www.robotstxt.org/robotstxt.html Robots txt.org site] provides the folowing explanation on using the The Robots Exclusion Protocol. |
| + | <sub>Link provided by BruceM on User2 list 6/30/2009 10:34 PM</sub> |
| + | |
| + | Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol. |
| + | It works likes this: a robot wants to vists a Web site URL, say <nowiki>http://www.example.com/welcome.html</nowiki>. Before it does so, it firsts checks for <nowiki>http://www.example.com/robots.txt</nowiki>, and finds: |
| + | |
| + | |
| + | <syntaxhighlight lang="html4strict" enclose="div"> |
| + | User-agent: * |
| + | |
| + | Disallow: / |
| + | </syntaxhighlight> |
| + | |
| + | |
| | | |
| + | The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site. |
| + | There are two important considerations when using /robots.txt: |
| + | * robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention. |
| + | * the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use. |
| + | So don't try to use /robots.txt to hide information. |
| Note that malware or email harvesting bots will ignore the directives of the robots.txt file. | | Note that malware or email harvesting bots will ignore the directives of the robots.txt file. |
| | | |
− | <sub>Link provided by BruceM on User2 list 6/30/2009 10:34 PM</sub>
| |
| | | |
| == Related links == | | == Related links == |
| | | |
− | {{: Security related links}} | + | [http://www.robotstxt.org/robotstxt.html Robots txt.org site] |
| + | {{: Security related links}}= |
| | | |
| [[Category:Security]] | | [[Category:Security]] |