This course introduced techniques and tools for analyzing and visualizing data at scale. It emphasized how to combine computation and visualization to perform effective analysis. Topics covered included big data analytics building blocks, data collection and storage, data cleaning and integration, data visualization, dimensionality reduction, data mining concepts, and data analytics.
This course consisted 4 homework assignments and a group project.
• Project 1: Collected and visualized data using SQLite, D3 warmup, and OpenRefine
• Project 2: Created graphs and visualizations using D3 and Tableau
• Project 3: Analyzed large data and graphs using Hadoop, Spark, Pig and Azure
• Project 4: Scalable PageRank via Virtual Memory (MMap), Random Forest, Scikit-Learn (used Python)
• Group Project: Created an interactive application based on a large data set of our choice (used Amazon Electronics data set) and applied data analytics concepts learnt in this course. Used R and Shiny to build the application. Below is a preview of the final application.