Skip to content

mrm1001/spark_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark_tutorial

Code for the Spark tutorial presented by Maria Mestre, Sahan Bulathwela and Erik Pazos

How to setup the local environment for the tuorial

Instructions to setup your local environment to run this Jupyter notebook can be found in the following blog post

Data

Special thanks to J. McAuley,R, Pandey, J. Leskovec, et al. at Stanford University for allowing us to use a sample of the Amazon product dataset for this tutorial.

Files

Spark_Tutorial.ipynb : The main file that contains the code for the Tutorial

Data

|_ classifiers

| |_ classifier.pkl : a scikit_learn logistic regression classifier used as a python classifier in the tutorial

|

|_ Products

| |_ sample_metadata.json : This file contains details about products found in Amazon (eg: price, asin, category)

|

|_ Reviews

|_ electronics.json : This file contains reviews about electronic products

|_ fashion.json : This file contains reviews about fashion products

|_ sports.json : This file contains reviews about sports equipment    

About

Code for the Spark tutorial at the Pydata conference in London June 2015

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •