In this project, I analyzed the historical data from the modern Olympic Games to identify trends and patterns in athlete participation, country representation, and overall performance across different editions of the Olympics. The goal of this analysis was to gain a deeper understanding of how the participation of athletes and countries has evolved over the years.
The "120 Years of Olympic History" dataset was sourced from Kaggle. It contains comprehensive historical data on the modern Olympic Games, spanning from the inaugural Athens 1896 Olympics to Rio 2016.
Each row in the dataset represents an individual athlete competing in a specific event and includes details such as the athlete’s name, age, sex, country, medal, and more.
After loading the dataset into Excel, I utilized Power Query to clean and preprocess the data:
- Removed Unnecessary Columns: Deleted columns like "ID" and "NOC" that were not relevant to the analysis.
- Corrected Data Types: Converted data types for certain columns (e.g., changing integer columns incorrectly marked as strings).
- Feature Engineering: Created an additional "Edition" column to uniquely identify each Olympic session by city.
To handle missing values in the age, height, and weight columns, I filled them with the respective mean values, grouped by the event category, as some events have specific combinations for these attributes. Below is an example formula used to fill null values in the "Age" column:
Similar methods were applied to the weight and height columns.
Finally, I have below a sample of stock uncleaned data, followed by the cleaned data.
Using pivot tables in Excel, I explored the following aspects:
-
Trends in athlete and country participation over time.
-
Distribution of age, height, and weight among athletes.
-
Top athletes and countries based on the number of medals won.
-
The ratio of male to female participants over the years.
- Cities that hosted the most Olympic editions.
I created a comprehensive dashboard to provide an overview of the analysis:
-
Key Performance Indicators (KPIs): Measures for total participants, total medals, and total sports were added to highlight overall performance.
-
Filters for Multilevel Views:
- Season: To differentiate between the Summer and Winter Olympics.
- Gender: To compare participation and performance between male and female athletes.
- Year: To analyze trends over different periods.