What truly makes a song popular? To answer this, analyzed the impact a song's attributes have on it's popularity. In order to do this, we looked at an dataset from Kaggle, Spotify Weekly Top 200, which is a dataset that contains songs from Spotify's 'Weekly Top Songs' for each country between 2021 to 2022.
Relevant Files:
- App.py
- Flask-based API that connects the front end to the database.
- CLICK HERE to see how to run the app
- Data_Exploration_Database_Creation
- Explore_Data.ipynb
- Cleaning initial dataset
- *NOTE* the file (final.csv) needed to run the Explore_Data notebook is too large to upload to GitHub. It can be downloaded HERE
- Final.csv will need to be placed in the same folder as Explore_data.ipynb to run correctly
- Df_Creation_for_Tables.ipynb
- calculating "Popularity Score"
- creating SQLite database from cleaned csv files
- Dashboard_img.ipynb
- creating matplotlib diagrams for dashboard page
- Explore_Data.ipynb
- Resources
- csv files created from data
- Songs_Complete_DB.sqlite
- Static
- images, css, and javascript files for app.py
- Templates
- html templates for app.py
-
The initial data set required cleaning and filtering in order for it to be used for the future steps. Certain columns were removed due the the size, which impacted the CSV being read.
-
The columns that were used in the CSV were artist name, track name, rank, weeks on chart, streams, danceability, energy, loudness, speechiness, acousticness, valence, tempo, duration and country. The cleaned data set was then divided into different data frames which would then be used to make different tables.
-
Since more than one attribute in the data represent in on a song's popularity, we had to create a way to quantify that data. Average rank in the top 200, average number of streams, and maximum weeks on chart all needed to be considered when determining a song's popularity. We created a "Popularity Score" that represented this data and ranked the data using this score. It was calculated as follows:
Popularity Score = ((average rank/highest rank of songs) + (average number of streams/highest average of song streams) + (max weeks on chart of song/max weeks on chart of all songs))/3
- These tables created were:
- 'country_attributes', which looks at the songs and their attributes, and then groups them together by countries
- 'top_fifty_genres' which showed the top 50 genres based on the number of artists.
- 'coolness' looks at the how popular a song is with it's rank, streams, weeks on the chart, and popularity score.
- 'all_countries_top_fifty' looks at the top fifty songs for each country based on it's popularity. These tables were then all exported to CSVs.
-
After all the data frames were created, we imported sqlite3 to create our database. Once sqlite was running, the dataframes were then converted into sql files, and tested to see if the file was working.
-
For the songs_complete database, we created four tables to represent the relevant data: coolness, country_attributes, fiftygenre and all_countries_top_50.
-
To complete this, Flask and SQLalchemy were first imported, and an engine was created with 'songs_complete_db.sqlite'. Python was linked to the database by creating a SQLAlchemy session.
-
Routes were created for:
- @app.route('/')
- @app.route('/map')
- @app.route('/attributes/')
- @app.route('/countries/')
-
A geojson of country polygons was used to create choropleths for each song attribute by country.
-
An infobox was created to display more detailed information for each country, ranking the song attributes in order of greatest difference between the countries top 50 songs average and the total average of the whole dataset per song attribute.
-
Minor stylistic features were added to highlight user interactivity and allow the user to switch between the map and a static dashboard displaying other statistics:
- Top 10 Artists Worldwide
- Song Attribute Distribution
- Top 10 Songs in Spotify
- Song Attribute Correlation Heatmap
- Top 50 Song Genres
-
Clone the SongAttributesVSPopularity Repo to your computer.
-
Open the SongAttributesVSPopularity folder in VS Code on your personal computer
-
Navigate to the App.py file in the folder
-
Right click on the file, pick "Run Python File in Terminal"
-
Ctrl + click in your VS Code Terminal where you see the link "Running on http://...."
-
A new browser window will open on our "Spotify Top Song Attributes Around the World" landing page.
Here you can view a Summary of the Project and Dashboard Images representing different interesting aspects that we observed after analyzing the song dataset
-
Navigate to the button labeled " Interactive Map " at the top of the landing page to lead you to the interactive geographical representation of the database.
- Click through the eight buttons at the top to see a colormap of each song attribute
- Click on any country to view its most popular song attributes compared to the average for the whole dataset
- Which countries favor the fastest songs? The longest ones? The most happy or sad? And so on. You can pose any question you want and click around to your heart's content!
- proposal write up
- dashboard_img.ipynb - data analysis and produce histogram of attributes and top ten songs.
- edited and exported the images from pandas for dashboard.
- converted the pandas dataframe to sqlite files and schema.
- structured blurbs and text
- Data Exploration and Cleaning
- Dataframe and CSV Creation via Python/Pandas
- Designed and Calculated Popularity Score
- ERD Creation/Edits
- Flask API -app.py
- Home page -index.html -style2.css Largely a copy of style.css by Becky (see below) with additions for the main page
- Created initial readme structure and description
- Created chart images through analysis (top 10 artists, top 50 genres, correlation chart for attributes) for the main page on the dashboard on charts.ipnyb
- Merged charts.ipynb to dashboard_img.ipynb
Original dataset on Kaggle: Spotify Weekly Top 200 Songs Streaming Data
More information on the meanings of each song attribute can be found here and here.
Country borders provided by jalbertbowden under the Open Data Commons Public Domain Dedication and License
Choropleth library can be found here.
Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under CC BY SA.

