This project analyzes Spotify track data to derive insights into track popularity and characteristics. The dataset used is from Kaggle and includes information on the most streamed Spotify songs as of 2024.
The dataset is sourced from Kaggle:
- Dataset Title: Most Streamed Spotify Songs 2024
- Dataset URL: Kaggle Dataset
The dataset contains the following columns:
Artist
: Name of the artistTrack Name
: Name of the trackTrack Score
: Score assigned to the trackSpotify Streams
: Number of Spotify streamsAll Time Rank
: All-time ranking of the trackSpotify Playlist Count
: Number of playlists featuring the trackExplicit Track
: Whether the track is explicitRelease Date
: Release date of the track (if available)
The project includes the following analyses and visualizations:
-
Univariate Analysis
- Distribution of Track Score
- Count of Explicit Tracks
-
Bivariate Analysis
- Relationship between Track Score and Spotify Streams
- Relationship between Track Score and All Time Rank
-
Correlation Analysis
- Heatmap showing correlation between numerical features
-
Additional Analyses and Plots
- Count of Tracks by Artist
- Distribution of Spotify Streams by Explicit Track
- Trend Analysis Over Time
- Artist Popularity by Streams
- Track Score Distribution by Explicit Content
- Spotify Streams Distribution by Track Score
- Pairwise Relationships
- Heatmap of Top 10 Artists' Streams by Month
To run the analysis, you'll need Python and the following libraries:
pandas
matplotlib
seaborn
numpy
You can install these libraries using pip:
pip install pandas matplotlib seaborn numpy
- Clone the Repository:
git clone https://github.com/lnpotter/spotify-track-insights.git # or
git clone https://github.com/yourusername/spotify-track-insights.git #if you forked the repository
- Navigate to the project directory:
cd spotify-track-insights
- Update the path to the dataset in the Jupyter Notebook and run the analyses.
df = pd.read_csv('path/to/your/dataset.csv', encoding='ISO-8859-1')
- Execute the Jupyter Notebook cells to view the visualizations and analyses.
Feel free to open an issue or submit a pull request if you have suggestions or improvements.
This project is licensed under the MIT License. See the LICENSE file for details.
- Kaggle for providing the dataset.
- Libraries used in this project: Pandas, Matplotlib, Seaborn, and Numpy.