Htaccess Rewrite
Jump to navigation
Jump to search
Note that artificial intelligence (AI) harvesting bots will ignore the directives of the robots.txt file, you can create a RewriteEngine section in your .htaccess file to limit their access to your site
Limiting Bot Access
The following example is an example of how I ( Ken Roy ) implemented Rewrite rules with ICDSoft support help to limit bots accessing pages that use a lot of MySQL resources.
# Stop bots from accessing certain pages - Ticket #890739 and 1084003
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "bot|slurp|spider|crawler|facebookexternalhit" [NC]
RewriteCond %{REQUEST_URI} ^/tng/calendar.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/guestbook.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/familychart.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/cousin_marriages.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/inlaw_marriages.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/searchform.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/search.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/relationshp.php [OR]
# following changed on 28 Apr 2024 to add additional pages
## RewriteCond %{REQUEST_URI} ^/tng/cpdisplay.php
RewriteCond %{REQUEST_URI} ^/tng/cpdisplay.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/xerxxIndividMaternalLine.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/xerxxIndividPaternalLine.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/xerxxFamilyMaternalLine.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/xerxxFamilyPaternalLine.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/parental_line.php [OR]
# following changed on 30 Apr 2024 to add additional pages
## RewriteCond %{REQUEST_URI} ^/tng/tng_associated_lines.php
RewriteCond %{REQUEST_URI} ^/tng/tng_associated_lines.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/familygroup.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/famsearch.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/chronology.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/cousins.php
RewriteRule ^.*$ - [F]
Limiting User Agents
Hans Weebers added another example on the TNG User2list
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://.*amazonaws\.com [OR]
RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AISearchBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} woriobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} heritrix [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetSeer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Nutch [NC]
RewriteRule ^(.*)$ - [F]