In this project, we aim to use clustering and principal component analysis (PCA) to identify similarities between songs on Spotify. We will be using a dataset of songs that includes various audio features provided by Spotify's API.
The dataset used in this project can be found on Kaggle. The dataset contains information on 12,000 songs, including their track ID, artist name, song title, and various audio features such as danceability, energy, loudness, and tempo.
Clustering is a technique used to group similar objects together based on their features. In this project, we will use clustering to group similar songs together based on their audio features. We will use the K-means algorithm for clustering, which is a popular clustering algorithm that aims to minimize the sum of squared distances between the data points and their assigned centroids.
PCA is a technique used to reduce the dimensionality of a dataset while preserving the important information. In this project, we will use PCA to identify the most important audio features that contribute to the similarity between songs. We will use the scikit-learn library in Python to perform PCA analysis.
This project requires the following dependencies:
- pandas
- numpy
- scikit-learn
- matplotlib
- scipy
To run the code, follow these steps:
- Clone this repository.
- Download the dataset from Kaggle and place it in the same directory as the code.
- Install the required dependencies.
- Open the Jupyter notebook clustering.ipynb
- Run the code cells in order.