Skip to content

Commit

Permalink
Add Crawler class
Browse files Browse the repository at this point in the history
  • Loading branch information
Basantloay committed May 16, 2021
1 parent 3d0128c commit 4986e7a
Show file tree
Hide file tree
Showing 19 changed files with 8,467 additions and 0 deletions.
Binary file not shown.
Binary file not shown.
Binary file not shown.
572 changes: 572 additions & 0 deletions robots0.txt

Large diffs are not rendered by default.

43 changes: 43 additions & 0 deletions robots1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Sitemap: https://www.cnn.com/sitemaps/cnn/index.xml
Sitemap: https://www.cnn.com/sitemaps/cnn/news.xml
Sitemap: https://www.cnn.com/sitemaps/sitemap-section.xml
Sitemap: https://www.cnn.com/sitemaps/sitemap-interactive.xml
Sitemap: https://www.cnn.com/ampstories/sitemap.xml
Sitemap: https://edition.cnn.com/sitemaps/news.xml
User-agent: *
Allow: /partners/ipad/live-video.json
Disallow: /*.jsx$
Disallow: *.jsx$
Disallow: /*.jsx/
Disallow: *.jsx?
Disallow: /ads/
Disallow: /aol/
Disallow: /beta/
Disallow: /browsers/
Disallow: /cl/
Disallow: /cnews/
Disallow: /cnn_adspaces
Disallow: /cnnbeta/
Disallow: /cnnintl_adspaces
Disallow: /development
Disallow: /editionssi
Disallow: /help/cnnx.html
Disallow: /NewsPass
Disallow: /NOKIA
Disallow: /partners/
Disallow: /pipeline/
Disallow: /pointroll/
Disallow: /POLLSERVER/
Disallow: /pr/
Disallow: /privacy
Disallow: /PV/
Disallow: /Quickcast/
Disallow: /quickcast/
Disallow: /QUICKNEWS/
Disallow: /search/
Disallow: /terms
Disallow: /test/
Disallow: /virtual/
Disallow: /WEB-INF/
Disallow: /web.projects/
Disallow: /webview/
341 changes: 341 additions & 0 deletions robots14.txt

Large diffs are not rendered by default.

Empty file added robots36.txt
Empty file.
216 changes: 216 additions & 0 deletions robots52.txt

Large diffs are not rendered by default.

183 changes: 183 additions & 0 deletions robots53.txt

Large diffs are not rendered by default.

17 changes: 17 additions & 0 deletions robots55.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
User-agent: Googlebot
Disallow:

User-agent: Googlebot-Image
Disallow:

User-agent: *
Disallow: /300250/
Disallow: /300250
Disallow: /checkout
Disallow: /orders
Disallow: /paypal_checkout
Disallow: /paypal_authorization_callback
Disallow: /paypal_cancel_callback
Disallow: /fb-product-feed
Disallow: /fb-stitcher-feed
Disallow: /tmp
Empty file added robots57.txt
Empty file.
Empty file added robots63.txt
Empty file.
17 changes: 17 additions & 0 deletions robots64.txt

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions robots65.txt

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions robots66.txt

Large diffs are not rendered by default.

1,350 changes: 1,350 additions & 0 deletions robots67.txt

Large diffs are not rendered by default.

183 changes: 183 additions & 0 deletions robots7.txt

Large diffs are not rendered by default.

5,525 changes: 5,525 additions & 0 deletions robots8.txt

Large diffs are not rendered by default.

Binary file added src/com/company/jsoup-1.13.1.jar
Binary file not shown.

0 comments on commit 4986e7a

Please sign in to comment.