Skip to content

amtellezfernandez/kMeansAlgorithm

Repository files navigation

K-Means Clustering and Gradient Descent Variants in Spark

I offer the two versions of the code, in HTML and in Scala. This repository is the final project of the course Machine Learnign for Big Data from the Master 2 Artificial Intelligence and Data Science (IASD). This project explores the implementation and optimization of K-Means clustering algorithms using Apache Spark in Scala, and investigates different variants of the Stochastic Gradient Descent (SGD) algorithm. The focus is on optimizing performance for large-scale data processing in distributed environments.

BigData Final Project: K-Means Clustering and Gradient Descent Variants in Spark Welcome to the repository for the BigData Final Project! This project explores the implementation and optimization of K-Means clustering algorithms using Apache Spark in Scala, and investigates different variants of the Stochastic Gradient Descent (SGD) algorithm. The focus is on optimizing performance for large-scale data processing in distributed environments.

Table of Contents

Part 1: K-Means Implementation and Optimization in Spark Scala Baseline Implementation Performance Analysis Optimization and Justification K-means++ Implementation DataFrame-Based Implementation

Part 2: K-Means Implementation Using DataFrames or DataSets DataFrame/DataSet Implementation Performance Comparison

Part 3: Gradient Descent Variants Momentum and Nesterov Variants of SGD

Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.

About

K-Means Clustering and Gradient Descent Variants in Spark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published