Skip to content

Goodreads.com book ratings and reviews Scraper. Scraper effectively handle JS rendering, pagination, and CAPTCHAs using Crawlbase Crawling API.

Notifications You must be signed in to change notification settings

ScraperHub/goodreads-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

crawling-api-cta

Goodreads Ratings & Reviews Scraper

Description

This repository contains a Python-based scraper for extracting book ratings and reviews from Goodreads. The scraper leverages the Crawlbase Crawling API to bypass bot protections, handle JavaScript rendering, and navigate button-based pagination automatically.

➡ Read the full blog here to learn more.

Scraper Overview

Goodreads Ratings & Reviews Scraper

The goodreads_scraper.py extracts the following details for each book:

  • Book Title
  • Rating
  • Reviews

The scraper efficiently handles button-based pagination using the Crawlbase Crawling API, ensuring comprehensive extraction of reviews across multiple pages.

Environment Setup

Ensure Python is installed on your system. Check the version using:

python --version

Install the required dependencies:

pip install requests
  • requests – Used for making API calls to Crawlbase.

Running the Scraper

1. Get Your Crawlbase Access Token

  • Sign up on Crawlbase to get an API token.
  • This token is required to access the Crawling API for bypassing bot protection.

2. Update the Scraper with Your Token

Replace "CRAWLBASE_JS_TOKEN" in the script with your Crawlbase Crawling API Token.

3. Run the Scraper

python goodreads_scraper.py

The extracted book ratings and reviews will be saved in a JSON file.

To-Do List

  • Extract additional book details like author, genres, and publication year.
  • Implement support for filtering reviews based on rating (e.g., only 5-star reviews).
  • Add export options for CSV and database storage.
  • Optimize request handling for large-scale scraping.

About

Goodreads.com book ratings and reviews Scraper. Scraper effectively handle JS rendering, pagination, and CAPTCHAs using Crawlbase Crawling API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages