A script for web scraping and downloading the Bitcoin Core bin directory.
Ideal for creating your own mirror!
Run-time dependency:
- Python3 + pip (
python3 python3-dev python3-pip) - Additional libs for Scrapy (
libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev)
More packages will be downloaded via pip, see next section.
I advice you to use a Python virtual environment, create & activate such an environment via:
python3 -m venv env
source env/bin/activateNext, install the required packages via:
pip install -r requirements.txtExecute scraper and start downloading:
scrapy crawl bitcoincoreOr by running: ./start_spider.py
Note: Files are stored within the bin sub-folder of the root-folder of this project.
Optionally, execute scraper and output the meta-data to a "feed" file (eg. JSON file):
scrapy crawl bitcoincore -O bitcoincore.jsonThe Docker image is available on DockerHub.
Note: The Docker Image will start the scrawler using a cronjob, so the bitcoin spider runs automatically once a week.
I provided a docker-compose file for convenience.
Building Docker image
Create a Docker image locally using:
docker build -t danger89/bitcoinscraper .You can use the Scrapy shell to help debugging or learn how to extract data when using scrapy:
scrapy shell 'https://bitcoincore.org/bin/'Check the response object for data, just an example:
response.css('pre a')[3].get()More info:
- Scrapy homepage
- Scrapy Tutorial docs (ideal for beginners)
- APScheduler Cron docs