Rain and ride weather data analysis.
Name | NetID |
---|---|
Bongjun Jang | bj2351 |
Luka Tragic | lt2205 |
Terrance Chen | tc3325 |
These are the people who I worked on this analysis with they all contributed majorly to this project.
- Ingestion: commands or codes to download data
- ETL: transform or clean data and store them in HDFS
- Profiling: provides insights for indivial data
- Analytics: statistical analysis on all dataset (correlation, etc.)
To run this code outside of a NYU premise you can upload the dataset to hadoop HDFS and to change the username in the code for the directories. You need to add the hard path of which directly for which.
Data should flow where they share folders for input output input output for each portion of the code.