A full-stack data project utilizing audio features data from the official Spotify Web API. In the first notebook Data Retrieval we pull all the tracks saved in my Spotify music library along with audio features for each track and save it to a .csv. In the Data Exploration notebook we do some exploratory analysis on the tracks as well as get an intuitive sense of the mean of each audio feature by listening to selected tracks. In the last notebook, Data Clustering we use K-Means clustering algorithm to find natural groupings of tracks based on the features and save them to playlists.
Main motivation for this project was to get practical experience in all the steps of a data project including (automated) data retrieval, data exploration, as well as data modeling using Python, getting familiar with Jupyter Notebooks, and learn about API endpoints.
Data Retrieval: github | nbviewer
Data Exploration: github | nbviewer
Data Clustering: github | nbviewer
My Spotify "liked" songs
- Spotipy - access to Spotify Web API with python
- IPython - embedding html for Spotify player
- Pandas, Numpy - data analysis
- Matplotlib, Seaborn - data visualization
- scikit-learn - K-Means algorithm
- Create dataset from Spotify library
- Exploratory data analysis
- Build an intuition for audio features by listening to songs through embedded Spotify player
- Use K-Means clustering to find clusters
- Create playlists based on found clusters and save them to my profile
- Find clusters within cluster to find subgenres
- Use another clustering method that works with mixed data type -- continuous and categorical data