A good, old fashioned web scraper to get static data from the Kiranico Monster Hunter Database. Static implies that it cannot get any data that is loaded from their private servers via API requests when you click buttons which load dynamic data onto their site. So be informed that some data will be missing!
There are 2 spiders in this project:
-
LocalDownloadSpider
which writes the HTML content of any information useful to this project and writes them to thehtml
folder. These files are included for convenience, but that's how they got there. It also has the benefit of being able to write parsers without being connected to the internet (yay!) -
KiranicoSpider
which goes through those files in thehtml
folder and scrapes that data with specialized parsers and both writes this data to CSV files as well as creating tables for the data and writing that data to those tables, if it has a connection to Postgresql.
This project relies on the scrapy
library as the engine to scrape HTML data. Each spider has their own name which is the class name of the spider without the Spider
suffix i.e. KiranicoSpider
is kiranico
.
$ scrapy crawl kiranico
If you want to play around with the response data in any of the local HTML files, do the following:
$ scrapy shell ./kiranico_scraper/html/folder/file-to-parse.html
Read the Scrapy Docs for more information.