Htaccess Rewrite

From TNG_Wiki
Jump to navigation Jump to search

Note that artificial intelligence (AI) harvesting bots will ignore the directives of the robots.txt file, you can create a RewriteEngine section in your .htaccess file to limit their access to your site

Limiting Bot Access

The following example is an example of how I ( Ken Roy ) implemented Rewrite rules with ICDSoft support help to limit bots accessing pages that use a lot of MySQL resources.

# Stop bots from accessing certain pages - Ticket #890739  and 1084003
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "bot|slurp|spider|crawler|facebookexternalhit" [NC]
RewriteCond %{REQUEST_URI} ^/tng/calendar.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/guestbook.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/familychart.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/cousin_marriages.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/inlaw_marriages.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/searchform.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/search.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/relationshp.php [OR]
# following changed on 28 Apr 2024 to add additional pages
## RewriteCond %{REQUEST_URI} ^/tng/cpdisplay.php 
RewriteCond %{REQUEST_URI} ^/tng/cpdisplay.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/xerxxIndividMaternalLine.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/xerxxIndividPaternalLine.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/xerxxFamilyMaternalLine.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/xerxxFamilyPaternalLine.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/parental_line.php [OR]
# following changed on 30 Apr 2024 to add additional pages
## RewriteCond %{REQUEST_URI} ^/tng/tng_associated_lines.php
RewriteCond %{REQUEST_URI} ^/tng/tng_associated_lines.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/familygroup.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/famsearch.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/chronology.php [OR]
RewriteCond %{REQUEST_URI} ^/tng/cousins.php
RewriteRule ^.*$ - [F]

Limiting User Agents

Hans Weebers added another example on the TNG User2list

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://.*amazonaws\.com [OR]
RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AISearchBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} woriobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} heritrix [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetSeer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Nutch [NC]
RewriteRule ^(.*)$ - [F]

References

Redirect Query String

Blocking bad bots in htaccess

How to prevent AI from scraping your website