Hacker News Crawler based upon Scrapy which crawls the entire site and stores the links in the Database.
-
Git clone the repository into your local system.
-
Install Scrapy via pip.
$ pip install scrapy
-
Configure Database settings in settings.py
-
Run the crawler by this command.
$ scrapy crawl hn
Output the data in JSON format by this command
scrapy crawl hn -o items.json -t json
- BeautifulSoup
- Lxml
- OpenSSL
Read the tutorial for detailed explanations.