Skip to content

vikaschouhan/piratescrape

Repository files navigation

thepiratebay API

Based on https://github.com/appi147/thepiratebay for command line usage from linux shell.

Dependencies

  • python3-libtorrent
  • python3-lxml
  • python3-pandas
  • python3-requests
  • python3-bs4

Scraper usage

python3 pirate_search.py --search [search_string] --base_url [piratebay_url] --out_file [output_csv_file] --max_pages [maximum_no_of_pages] --use_onion

  • search_string - string to be searched enclosed in quotes.
  • piratebay_url - Use a specific piratebay url. This is usually used when the main piratebay server is down and you want to use a proxy url.
  • output_csv_file - This is the output csv file where the results would be saved.
  • maximum_no_of_pages - An integer specifying maximum number of pages you want to scrape. By default only single (first) page is scraped.
  • use_onion (flag) - This is used when you want to access piratebay via tor network. Tor service should be running on port 9050 for this to work.

Downloader usage

python3 download_torrents.py --in_file [input_csv_file] --out_dir [output_torrent_dir] --min_seeders [min_seeders_to_filter_torrents] --category [main_category_to_filter] --sub_category [sub_category_to_filter] --timeout [timeout_in_seconds]

  • input_csv_file - csv file which was generated by pirate_search.py
  • output_torrent_dir - Output directory where your torrents would be saved.
  • min_seeders_to_filter_torrents - Minimum number of seeders to consider. Only those torrents having seeders more than this threshold would be considered for downloading.
  • main_category_to_filter - Specify main category name (you can find the main categories in the csv file)
  • sub_category_to_filter - Specify sub category name (also in input csv)
  • timeout_in_seconds - Specify timeout in seconds. Useful for cases where it takes too long to fetch torrents metadata. If not specified, then timeout is not applied and metadata fetcher keeps on waiting until it's available or the library call itself timeouts.

About

Scrape piratebay results in batch mode.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages