Skip to content

Commit

Permalink
Add more bots (fnando#331)
Browse files Browse the repository at this point in the history
  • Loading branch information
omrilotan authored and fnando committed Feb 27, 2018
1 parent a1dbede commit 9046119
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 0 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@

- Add Google Site Verification to the bot list.
- Handle invalid quality values that look like numbers.
- Add AlwaysOnline bot: CloudFlare
- Add News aggregator crawler: AndersPink, BuzzBot
- Add Domain crawler: CipaCrawler
- Add Job bot: JobSeeker's
- Add Apparel crawler: TeeRaid
- Add Search engine crawler: SemanticBot, Mappy
- Add Copyright crawler: Copypants' BotPants
- Add SEO bots: SEOdiver, SeoAudit, WebCeo
- Add Woriobot from Zite

## v2.5.2

Expand Down
14 changes: 14 additions & 0 deletions bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ adsbot-google: "Google Adwords"
advbot: "AdvBot"
ahrefsbot: "Ahrefs backlinks research tool"
alexa: "Alexa Crawler"
anderspink: "AndersPinkBot"
apache-httpclient: "Java http library"
apachebench: "ApacheBench (ab)"
apis-google: APIs-Google
Expand All @@ -28,15 +29,19 @@ bot@linkfluence.net: "Linkfluence bot"
bufferbot: "BufferBot"
buibui-checkbot: "buibui"
butterfly: "Topsy Labs"
buzzbot: "Buzzbot"
buzztalk: "buzztalk"
catchbot: "CatchBot (catchbot.com)"
check_http: "Nagios monitor"
cipacrawler: "CipaCrawler"
cliqzbot: "Cliqzbot"
cloudflare: "CloudFlare-AlwaysOnline"
cmradar/0.1: "CMRadar/0.1"
coldfusion: "ColdFusion http library"
commoncrawl: "CCBot"
comodo ssl checker: 'COMODO SSL Checker'
comodo-webinspector-crawler: "Comodo"
copypants: "BotPants"
crowsnest: "Crowsnest"
curabot: "cura.yt"
curl: "curl unix CLI http client"
Expand Down Expand Up @@ -100,6 +105,7 @@ jack: "jack"
jakarta commons: "Jakarta Commons HttpClient"
java: "Generic Java http library"
jetslide: "Jetslide"
jobseeker: "jobseeker.com.au/bot.html"
js-kit: "URL resolver"
kemvibot: "Kemvi"
kimengi: "Kimengi Bot"
Expand All @@ -123,6 +129,7 @@ lumibot: "Lumibot"
lwp-trivial: "Another Perl library"
magpie-crawler: "magpie-crawler"
mail.ru_bot: "Mail.ru Bot"
mappydata: "Mappy"
meanpathbot: "meanpath"
mediapartners-google: "Google Adsense bot"
megaindex.ru: "MegaIndex"
Expand Down Expand Up @@ -181,7 +188,10 @@ ruby: "Ruby"
scrapy: "Scrapy"
screaming frog seo spider: Screaming Frog SEO Spider
searchmetricsbot: "SearchmetricsBot"
semanticbot: "Semanticbot"
semrushbot: "SEO analysis bot"
seo-audit: "seo-audit-check-bot"
seodiver: "SEOdiver"
seokicks: "SEOKicks"
seznambot: "SeznamBot"
shopwiki: "ShopWiki"
Expand All @@ -204,6 +214,7 @@ squider: "Squider"
statuscake: "StatusCake"
stripe: "Stripe"
swiftbot: "Swiftype Bot"
teeraid: "TeeRaidBot"
test certificate info: "C http library?"
tineye: "TinEye Bot"
traackr: "Traackr Bot"
Expand All @@ -229,11 +240,13 @@ vrcrawler: "Venture Radar"
wasalive-bot: "Wasalive Bots"
watchsumo: "WatchSumo"
wbsearchbot: "Ware Bay Best Buys"
webceo: "online-webceo-bot"
webscout: "Webscout"
wesee: "WeSEE"
wget: "wget unix CLI http client"
whatsapp: "WhatsApp"
wordpress: "WordPress spider"
woriobot: "woriobot"
wormly: "WormlyBot"
wotbox: "Wotbox"
xenu link sleuth: "Xenu Link Sleuth"
Expand All @@ -242,6 +255,7 @@ xovibot: "XoviBot"
yacybot: "YaCy"
yahoo-ad-monitoring: "Yahoo Ad monitoring"
yandex: "Yandex"
yanga: "Yanga WorldSearch Bot"
yeti: "Naver Corp"
yourls: "YOURLS"
zelist.ro: "feed parser"
Expand Down
14 changes: 14 additions & 0 deletions test/ua_bots.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,20 @@
ADLXBOT: 'Mozilla/5.0 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)'
ANDERSPINK: 'Mozilla/5.0 (compatible; AndersPinkBot/1.0; +http://anderspink.com/bot.html)'
APIS_GOOGLE: 'APIs-Google; (+https://developers.google.com/webmasters/APIs-Google.html)'
APPLE_BOT: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1)'
ASK: 'Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html)'
AWS_ELB: ELB-HealthChecker/1.0
BAIDU: 'Baiduspider+(+http://www.baidu.com/search/spider.htm)'
BINGBOT: 'Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)'
BINGPREVIEW: 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b'
BUZZBOT: 'Buzzbot/1.0 (Buzzbot; http://www.buzzstream.com; buzzbot@buzzstream.com)'
COPYPANTS: 'Mozilla/5.0 (compatible; BotPants/1.0; Linux; +info@copypants.com) KHTML/3.5.5 (like Gecko)'
CLOUDFLARE: "Mozilla/5.0 (compatible; CloudFlare-AlwaysOnline/1.0; +http://www.cloudflare.com/always-online) AppleWebKit/534.34"
COMMONCRAWL: 'CCBot/2.0 (http://commoncrawl.org/faq/)'
COMODO_SSL_CHECKER: 'COMODO SSL Checker'
DAUMOA: Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server;) Daumoa 4.0
DOMAINAREANIMATOR: 'Domain Re-Animator Bot (http://domainreanimator.com) - support@domainreanimator.com'
CIPACRAWLER: 'CipaCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/www.example.com)'
DOT_BOT: 'Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)'
DUCKDUCKGO: 'DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)'
FACEBOOK_BOT: 'facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)'
Expand All @@ -20,10 +25,12 @@ GOOGLE_STACKDRIVER_UPTIME_CHECKS: 'GoogleStackdriverMonitoring-UptimeChecks'
GOOGLE_STRUCTURED_DATA_TESTING_TOOL2: 'Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +http://developers.google.com/structured-data/testing-tool/)'
GOOGLE_STRUCTURED_DATA_TESTING_TOOL: 'Mozilla/5.0 (compatible; X11; Linux x86_64; Google-StructuredDataTestingTool; +http://www.google.com/webmasters/tools/richsnippets)'
GRAPESHOT: 'Mozilla/5.0 (compatible; GrapeshotCrawler/2.0; +http://www.grapeshot.co.uk/crawler.php)'
JOBSEEKER: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) JobBot/5.0 (compatible; +http://www.jobseeker.com.au/bot.html) Safari/538.1'
LINKDEXBOT: 'Mozilla/5.0 (compatible; linkdexbot/2.0; +http://www.linkdex.com/bots/)'
LOAD_TIME_BOT: 'Mozilla/5.0 (compatible; LoadTimeBot/0.9; +http://www.loadtime.net/bot.html)'
LTX71: 'ltx71 - (http://ltx71.com/)'
MAIL_RU: 'Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)'
MAPPYDATA: 'Mozilla/5.0 (compatible; Mappy/1.0; +http://mappydata.net/bot/)'
MEGAINDEX_RU: 'Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +https://www.megaindex.ru/?tab=linkAnalyze)'
MRCHROME: 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.107 Amigo/45.0.2454.107 MRCHROME SOC Safari/537.36'
MSNBOT: 'msnbot/2.0b (+http://search.msn.com/msnbot.htm)'
Expand All @@ -35,17 +42,24 @@ PRIVACYAWAREBOT: 'Mozilla/5.0 (compatible; PrivacyAwareBot/1.1; +http://www.priv
PROXIMIC: 'Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)'
QUERYSEEKER: 'QuerySeekerSpider ( http://queryseeker.com/bot.html )'
SCRAPY: 'Scrapy/0.18.4 (+http://scrapy.org)'
SEMANTICBOT: 'Mozilla/5.0 (compatible; Semanticbot/1.0; +http://sempi.tech/bot.html)'
SEO_AUDIT: 'Mozilla/5.0 (compatible; seo-audit-check-bot/1.0)'
SEODIVER: 'Mozilla/5.0 (compatible; SEOdiver/1.0; +http://www.seodiver.com/bot)'
SEOKICKS: 'Mozilla/5.0 (compatible; SEOkicks-Robot; +http://www.seokicks.de/robot.html)'
SISTRIX: 'Mozilla/5.0 (compatible; SISTRIX Crawler; http://crawler.sistrix.net/)'
SOCIALRANKIO: SocialRankIOBot; http://socialrank.io/about
SQUIDER: 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36 Squider/0.01'
STRIPE: 'Stripe/1.0 (+https://stripe.com/docs/webhooks)'
SWIFTYPE: 'Swiftbot'
TEERAID: 'Mozilla/5.0 (compatible; TeeRaidBot; +https://teeraid.com/bot/)'
TINEYE: 'TinEye-bot/0.51 (see http://www.tineye.com/crawler.html)'
TRAACKR: 'Traackr.com'
WATCHSUMO: 'Mozilla/5.0 (compatible) WatchSumo/1.0.0 (http://www.watchsumo.com)'
WHATSAPP: 'WhatsApp/2.17.38 Mozilla/5.0 (Linux; U; Android 6.1; en-us; DV Build/Donut) AppleWebKit/537.36 (KHTML, like Gecko) Safari/537.36'
WEBCEO: 'Mozilla/5.0 (compatible; online-webceo-bot/1.0; +http://online.webceo.com)'
WORIOBOT: 'Mozilla/5.0 (compatible; woriobot +http://worio.com)'
YAHOO_AD_MONITORING: 'Mozilla/5.0 (compatible; Yahoo Ad monitoring; https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html)'
YAHOO_SLURP: 'Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)'
YANDEX_DIRECT: 'Mozilla/5.0 (compatible; YandexDirect/3.0; +http://yandex.com/bots)'
YANDEX_METRIKA: 'Mozilla/5.0 (compatible; YandexMetrika/3.0; +http://yandex.com/bots)'
YANGA: 'Yanga WorldSearch Bot v1.1/beta (http://www.yanga.co.uk/)'

0 comments on commit 9046119

Please sign in to comment.