This analysis involves two vital datasets containing statistics about medals won by each country and their respective rankings. Athletes from each country secured various medals (gold, silver, and bronze) by showcasing their performances during the games, which determined their ranks based on the results.
The initial step was to implement data extraction process from the official website. Since the official website of Paris 2024 Olympics has been unable to import fast, I decided to import data of medals table from a Wikipedia page through the URL in MS Excel:
- Go to the Data tab in the ribbon.
- Click on Get Data, then select From Other Sources, and click on the From Web option.
After this step, a pop-up window will be appeared asking for a URL. I entered the URL of the medals page, then click OK.Then, it's time to connect it to the webpage.
- Power Query will try to get data from the webpage.
- Then, a Navigator window will open, showing tables that Excel found. Choose the table with the medals data.
- If the table doesn't appear right away, then check under "Document" or "Web View".
Once we find the right table, click Load button.
The paris2024 database is created and set to allow write operations. The database contains two primary tables, medals and olympics_results, which are later joined to form a comprehensive games_results table for analysis. I renamed columns in both tables for clarity (e.g., noc → country_name, gold → gold_medals). I expanded the athletes column to accommodate detailed athlete data. A new column, Gender, is added to olympics_results. The Gender column is updated based on patterns in the event_type column to categorize events as Male, Female, or Mixed. We update record column in games_results table to indicated records set during events: WR, O, None.
rank - displays the ranking of each country after the games have concluded;
noc - indicates the name of each country representing each of them;
gold - indicates the number of gold medals won by each country;
silver - indicates the number of silver medals won by each country;
bronze - indicates the number of bronze medals won by each country;
total - represents the total number of gold, silver and bronze medals for each country.
noc - indicates the name of each country representing each of them;
medal - indicates the type of medal (gold, silver, or bronze) won by each athlete or group of athletes;
name - displays the full name of each athlete who won a medal for their country;
sport - indicates the sport in which each athlete showcased their performance;
event - lists each event type within each sport in which athletes secured their medals;
date - indicates the full date on which each athlete or a group of athletes won their medal.
Before starting to analyze these datasets, make sure that we must have an installed version of My SQL. Check the official website of [MySQL]("https://www.mysql.com/") for details.
- Launch MySQL Workbrench on your desktop or use MySQL Command-Line Shell for your analysis.
- Under MySQL Connections, click on your connection to open it. If you haven't set up a connection yet, then click the "+" button to create a new one and enter your connection details (Connection Name, Connection Method, Hostname, and Port). Press "OK".
- It will create a connection set. Once we press in it, this will require to type a password.
The main objectives of my analysis entail in the following way:
- Analysis of Medal Counts by Country and Sport
- Comparison of Male and Female Athletes' Medals by Sport
- Athletes with the Highest Number of Medals by Gender and Sport
- Date of Maximum Medal Wins for Male and Female Athletes
- Countries Whose Athletes Broke Olympic or World Records (OR, WR)
- First Athlete to Break a Record (OR, WR) and Date of Achievement
- Top Athletes with the Most Medals in Each Country
First, I imported inline package for using Jupyter notebooks to display plots directly within the notebook interface. Then, I installed MySQL Connector with pip pip install mysqlconnector so that I could import mysql.connector library to create a communication between Python and MySQL databases. Also, I installed SQLAlchemy using pip install SQLAlchemy to support this connection. Then, I wrote the code which establishes a connection to a MySQL database using the mysql.connector.connect() function: it took the host, user, password and database parameters to interact with the MySQL database for querying or inserting data.