Skip to content

Projects focused on big data processing, including the latest big data technologies (Spark), and NoSQL database (MongoDB).

Notifications You must be signed in to change notification settings

Agewerc/data-processing-big-data

Repository files navigation

Data Processing for Big Data

Imgur

This repository holds projects developed in the unit Data Processing for Big Data from the Master of Data Science at Monash University. This unit focuses on big data processing, including volume, complexity, and velocity using the latest big data technologies. In big data volume, it covers large volume data processing using parallel technologies. In large dimensionality (or complexity), it covers various data analytics methods for parallel processing. For the velocity, it covers data streaming processing.

Description

Two projects were developed:

  1. Big data processing, analysis and visualisation: In this project we perform a series of analysis, exploration and visualisation activities in text and csv files. The data primary data structure used were RDD's.

  2. Machine Learning for Weather Forecasting: In this project we make use of the library MLlib to apply different machine learning algorithms in the Rain Australia Kaggle Dataset.

Some of the learning outcomes

  • Identify and explain big data concepts and technologies;
  • Write and interpret parallel database processing algorithms and methods;
  • Apply common data analytics and machine learning algorithms in a big data environment;
  • use and evaluate streaming methods in big data processing;
  • use big data streaming technologies.

Visuals

Some visualisations nice developed in the projects.

Imgur

Imgur

About

Projects focused on big data processing, including the latest big data technologies (Spark), and NoSQL database (MongoDB).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published