Skip to content

this repository created to contain example codes presented in a big data meetup: "Evolution of data modeling and optimizations in Spark (comparing RDD, DataFrame and DataSet APIs)"

License

Notifications You must be signed in to change notification settings

symat/spark-api-comparison

Repository files navigation

spark-api-comparison

this repository created to contain example codes presented in a big data meetup: "Evolution of data modeling and optimizations in Spark (comparing RDD, DataFrame and DataSet APIs)"

get the data

easy way

hard way

  • download the original data files from the IMDB ftp servers (see http://www.imdb.com/interfaces). You will need the following files:
    • actors.list.gz
    • actresses.list.gz
    • ratings-list-gz
  • use the parser scripts in the data folder on the downloaded files

About

this repository created to contain example codes presented in a big data meetup: "Evolution of data modeling and optimizations in Spark (comparing RDD, DataFrame and DataSet APIs)"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published