Skip to content

kanasubs/d-addicts.com-crawler

Repository files navigation

d-addicts.com-crawler

FOSSA Status Build Status Coverage Status Code Climate Issue Count

A Python web spider library and CLI utility to crawl for Japanese subtitles links in d-addicts.com.

Install dependencies


pip3 install html5lib -r requirements.txt

Usage


As a library

from daddicts_spider import DAddictsSpider

all_sub_links = set()
delay_between_requests = 6  # optional arg to DAddictsSpider
take_at_least_n_links = 10  # optional arg to DAddictsSpider
for sub_links in DAddictsSpider(delay_between_requests, take_at_least_n_links):
    print(sub_links)
    all_sub_links |= sub_links

As a CLI utility

> ./daddicts_spider.py --help
usage: daddicts_spider.py [-h] [-d DELAY] [-t TAKE | -c CRAWL]

optional arguments:
  -h, --help               show this help message and exit
  -d DELAY, --delay DELAY  delay in seconds between HTTP requests
  -t TAKE, --take TAKE     take at least and around 'n' links. Will resume
                           from last point when calling the program again.
  -c CRAWL, --crawl CRAWL  crawl 'n' times. Will resume from last point when
                           calling the program again.

Testing


pip3 install nose and then run nosetests in the project's root directory.

License


Copyright (C) 2017 Carlos C. Fontes.

Licensed under the ISC License.

About

Crawls for subtitles links in d-addicts.com

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages