
This repository contains a Python-based scraper for extracting book ratings and reviews from Goodreads. The scraper leverages the Crawlbase Crawling API to bypass bot protections, handle JavaScript rendering, and navigate button-based pagination automatically.
➡ Read the full blog here to learn more.
The goodreads_scraper.py
extracts the following details for each book:
- Book Title
- Rating
- Reviews
The scraper efficiently handles button-based pagination using the Crawlbase Crawling API, ensuring comprehensive extraction of reviews across multiple pages.
Ensure Python is installed on your system. Check the version using:
python --version
Install the required dependencies:
pip install requests
- requests – Used for making API calls to Crawlbase.
- Sign up on Crawlbase to get an API token.
- This token is required to access the Crawling API for bypassing bot protection.
Replace "CRAWLBASE_JS_TOKEN
" in the script with your Crawlbase Crawling API Token.
python goodreads_scraper.py
The extracted book ratings and reviews will be saved in a JSON file.
- Extract additional book details like author, genres, and publication year.
- Implement support for filtering reviews based on rating (e.g., only 5-star reviews).
- Add export options for CSV and database storage.
- Optimize request handling for large-scale scraping.