Skip to content

bianchimario/DistributedDataAnalysisAndMining

Repository files navigation

DDAM

Progetto svolto in collaborazione con Martina Trigilia, Francesco Santucciu e Michele Andreucci

Distributed Data Analysis and Mining - Spark (Hadoop)

Analysis of the dataset Australia, Rain Tomorrow.

Tasks:

  1. Data Understanding
  2. Data Preparation
  3. Classification and Clustering
  4. Regression

About the course: "this course aims at teaching the basic theoretical concepts behind the MapReduce distributed computing paradigm, and Hadoop in particular, and at building expertise in the practical usage of high-performance computing tools for data engineering, analysis and mining. In particular, the students will learn how classical data mining algorithms can be applied to Big Data using Hadoop (Spark). Real (and open source) datasets will be used to present examples and to let the students build their own projects".

About

Project for the course of Distributed Data Analysis and Mining - Spark (Hadoop)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published