Skip to content

Commit

Permalink
fix: block more AI bots (#10754)
Browse files Browse the repository at this point in the history
Following Cloudflare analysis on

https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/

---------

Co-authored-by: Open Food Facts Bot <contact@openfoodfacts.org>
  • Loading branch information
raphael0202 and Open Food Facts Bot authored Sep 3, 2024
1 parent 9e6b11c commit 852ca5f
Show file tree
Hide file tree
Showing 7 changed files with 38 additions and 2 deletions.
4 changes: 2 additions & 2 deletions lib/ProductOpener/Display.pm
Original file line number Diff line number Diff line change
Expand Up @@ -1017,12 +1017,12 @@ sub set_user_agent_request_ref_attributes ($request_ref) {
my $is_crawl_bot = 0;
my $is_denied_crawl_bot = 0;
if ($user_agent_str
=~ /\b(Googlebot|Googlebot-Image|Google-InspectionTool|bingbot|Applebot|Yandex|DuckDuck|DotBot|Seekport|Ahrefs|DataForSeo|Seznam|ZoomBot|Mojeek|QRbot|Qwant|facebookexternalhit|Bytespider|GPTBot|cohere-ai|anthropic-ai|PerplexityBot|ClaudeBot|Claude-Web|SEOkicks|Searchmetrics|MJ12|SurveyBot|SEOdiver|wotbox|Cliqz|Paracrawl|Scrapy|VelenPublicWebCrawler|Semrush|MegaIndex\.ru|Amazon|aiohttp|python-request)/i
=~ /\b(Googlebot|Googlebot-Image|Google-InspectionTool|bingbot|Applebot|Yandex|DuckDuck|DotBot|Seekport|Ahrefs|DataForSeo|Seznam|ZoomBot|Mojeek|QRbot|Qwant|facebookexternalhit|Bytespider|GPTBot|ChatGPT-User|cohere-ai|anthropic-ai|PerplexityBot|ClaudeBot|Claude-Web|SEOkicks|Searchmetrics|MJ12|SurveyBot|SEOdiver|wotbox|Cliqz|Paracrawl|Scrapy|VelenPublicWebCrawler|Semrush|MegaIndex\.ru|Amazon|aiohttp|python-request|ImagesiftBot|Diffbot)/i
)
{
$is_crawl_bot = 1;
if ($user_agent_str
=~ /\b(bingbot|Seekport|Ahrefs|DataForSeo|Seznam|ZoomBot|Mojeek|QRbot|Bytespider|SEOkicks|Searchmetrics|MJ12|SurveyBot|SEOdiver|wotbox|Cliqz|Paracrawl|Scrapy|VelenPublicWebCrawler|Semrush|MegaIndex\.ru|YandexMarket|Amazon|GPTBot|PerplexityBot|ClaudeBot|Claude-Web|cohere-ai|anthropic-ai)/i
=~ /\b(bingbot|Seekport|Ahrefs|DataForSeo|Seznam|ZoomBot|Mojeek|QRbot|Bytespider|SEOkicks|Searchmetrics|MJ12|SurveyBot|SEOdiver|wotbox|Cliqz|Paracrawl|Scrapy|VelenPublicWebCrawler|Semrush|MegaIndex\.ru|YandexMarket|Amazon|GPTBot|ChatGPT-User|PerplexityBot|ClaudeBot|Claude-Web|cohere-ai|anthropic-ai|ImagesiftBot|Diffbot)/i
)
{
$is_denied_crawl_bot = 1;
Expand Down
6 changes: 6 additions & 0 deletions templates/web/pages/robots/robots.tt.txt
Original file line number Diff line number Diff line change
Expand Up @@ -103,4 +103,10 @@ User-agent: Claude-Web
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
[% END %]
Original file line number Diff line number Diff line change
Expand Up @@ -221,3 +221,9 @@ User-agent: Claude-Web
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
Original file line number Diff line number Diff line change
Expand Up @@ -310,3 +310,9 @@ User-agent: Claude-Web
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
Original file line number Diff line number Diff line change
Expand Up @@ -310,3 +310,9 @@ User-agent: Claude-Web
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
Original file line number Diff line number Diff line change
Expand Up @@ -221,3 +221,9 @@ User-agent: Claude-Web
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
Original file line number Diff line number Diff line change
Expand Up @@ -221,3 +221,9 @@ User-agent: Claude-Web
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /

0 comments on commit 852ca5f

Please sign in to comment.