Jupyter environment for the big data course.
When running this environment without access to SURFsara's HDFS you can download the required data files from the following locations (links are dead and will be replaced soon):
- 2008.csv.gz. Originals source: http://stat-computing.org/dataexpo/2009/the-data.html
- shakespeare.txt
- tweets.json
- germancredit.csv
When running without access to HDFS, download the files above to your own computer and upload them to the notebook environment. The paths in the notebooks need to be changed to the new location and the kinit cells can be skipped.