Spotify Song Clustering Model

Project Overview

This project implements clustering algorithms to categorize 114,000 Spotify songs based on their audio features. The goal is to identify patterns and groupings within the music dataset using unsupervised machine learning techniques. The clusters are expected to represent underlying emotional characteristics of the songs.

Dataset

Source: 🎹 Spotify Tracks Dataset @ Kaggle - https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset
Size: 114,000 songs
Attributes: 9 audio features used for clustering

Methodology

Two clustering algorithms were implemented using scikit-learn:

K-means
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

These algorithms were chosen to explore both centroid-based and density-based clustering approaches.

Features

The following 9 audio features were used for clustering:

Danceability
Energy
Loudness
Speechiness
Acousticness
Instrumentalness
Liveness
Valence
Tempo

Usage

To run the clustering model, use the following command:

python main.py [-t] [n] [algorithm] [feature1] [feature2] [feature3]

This command will run K-means with 5 clusters, show the training process, and display the clusters using valence, tempo, and energy features.

Visualization

When run with default settings (e.g., python main.py kmeans 5), the plot will display:

X-axis: Danceability
Y-axis: Energy
Z-axis: Loudness

Each point in the scatter plot represents a song, and the colors indicate different clusters. This visualization helps to understand how songs are grouped based on their audio features.

Dependencies

Python 3.x
scikit-learn
numpy
pandas
matplotlib

Installation

pip install scikit-learn numpy pandas matplotlib

Future Work

Analyze the emotional characteristics of each cluster
Experiment with other clustering algorithms (e.g., Hierarchical Clustering, Gaussian Mixture Models)
Incorporate additional features or metadata (e.g., genre, release year)
Develop a recommendation system based on the clustering results

Contact

Email: knguy@purdue.edu

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
static		static
README.md		README.md
archive.zip		archive.zip
dataset.csv		dataset.csv
dbscan.py		dbscan.py
display.py		display.py
kmeans.py		kmeans.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Song Clustering Model

Project Overview

Dataset

Methodology

Features

Usage

Visualization

Dependencies

Installation

Future Work

Contact

About

Releases

Packages

Languages

Qischer/spotify-attr-cluster

Folders and files

Latest commit

History

Repository files navigation

Spotify Song Clustering Model

Project Overview

Dataset

Methodology

Features

Usage

Visualization

Dependencies

Installation

Future Work

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages