Code for the Spark tutorial presented by Maria Mestre, Sahan Bulathwela and Erik Pazos
Instructions to setup your local environment to run this Jupyter notebook can be found in the following blog post
Special thanks to J. McAuley,R, Pandey, J. Leskovec, et al. at Stanford University for allowing us to use a sample of the Amazon product dataset for this tutorial.
Spark_Tutorial.ipynb : The main file that contains the code for the Tutorial
Data
|_ classifiers
| |_ classifier.pkl : a scikit_learn logistic regression classifier used as a python classifier in the tutorial
|
|_ Products
| |_ sample_metadata.json : This file contains details about products found in Amazon (eg: price, asin, category)
|
|_ Reviews
|_ electronics.json : This file contains reviews about electronic products
|_ fashion.json : This file contains reviews about fashion products
|_ sports.json : This file contains reviews about sports equipment