NOTE: Our datasets have been moved! Please see our new webpage about how to download these datasets.
The datasets were collected in late 2017 from goodreads. Details of the datasets are described in the dataset website
We collected these datasets for academic use only! Please do not redistribute them or use for commercial purposes.
If you are using our dataset, please cite the following papers:
- Mengting Wan, Julian McAuley, "Item Recommendation on Monotonic Behavior Chains", in RecSys'18. [bibtex]
- Mengting Wan, Rishabh Misra, Ndapa Nakashole, Julian McAuley, "Fine-Grained Spoiler Detection from Large-Scale Review Corpora", in ACL'19. [bibtex]
We've created several notebooks (in python 3.7) to illustrate how to download/read these datasets, and provide some basic explorations of the data.
- download.ipynb: If you prefer to download datasets without GUI. This notebook will show how to download files in bash/python.
- samples.ipynb: This notebook will show how to read '.json.gz' files line-by-line and display sample records of each file.
- statistics.ipynb: This notebook will calculate some basic statistics of the datasets (except the largest complete interaction file 'goodreads_interactions.csv'). Running this notebook may take a while.
- distributions.ipynb: This notebook will operate on the complete interaction file 'goodreads_interactions.csv' and provide some explorations of the distributions of these interactions. Note: Run this notebook only when you have LARGE memory (recommend 32g+)!!
- reviews.ipynb: This notebook will calculate some statistics of the review datasets.