This project analyzes COVID-19 data to uncover insights and trends related to the pandemic, focusing on cases, deaths, testing, and other metrics across multiple countries. It includes data cleaning, visualization, and statistical analysis to better understand the impact and spread of the virus.
-
Data Source: Our World in Data - COVID-19 Dataset
-
Dataset: The dataset includes global COVID-19 data such as daily cases, deaths, testing rates, and population metrics.
-
Key Columns:
location: Country or regiondate: Date of data entrytotal_cases: Cumulative confirmed casesnew_cases: Daily new casestotal_deaths: Cumulative deathsnew_deaths: Daily new deathstotal_tests: Total tests conductedpopulation: Population of the country
- Analyze the spread of COVID-19 over time for selected countries.
- Compare daily and cumulative cases, deaths, and testing rates.
- Visualize trends using line charts, bar charts, and heatmaps.
- Identify key insights such as top countries by total cases, death rates, and testing rates.
- Filtered data for selected countries (e.g., Kenya, USA, India).
- Removed missing values and handled data inconsistencies.
- Converted the
datecolumn to datetime format. - Filled missing numeric values using forward fill and interpolation techniques.
- Line charts for cumulative cases and deaths over time.
- Bar charts for top 10 countries by total cases.
- Heatmaps to analyze correlations between key metrics.
- Line Charts: Display trends in total cases and deaths over time.
- Bar Charts: Highlight the top countries by total cases.
- Heatmaps: Show correlation between various numeric columns.
- Python
- Jupyter Notebook
- Pandas
- Matplotlib
- Seaborn
- Plotly
-
Clone the repository:
-
Navigate to the project directory:
cd CovidData
-
Install required libraries:
pip install -r requirements.txt
-
Open the notebook:
jupyter notebook
-
Execute the notebook cells to run the analysis.
- The USA recorded the highest number of cases and tests conducted throughout the observed period.
- Kenya showed relatively lower testing rates but maintained lower case counts compared to India and the USA.
- Correlation analysis indicated that higher testing rates were associated with higher confirmed cases.
In the wake of a global pandemic that left the world grappling with uncertainty, Jupyter Notebook emerged not just as a tool, but as a compass guiding humanity through the fog of COVID-19’s impact. Armed with data, scientists, researchers, and analysts turned to the power of Python and the flexibility of Jupyter to make sense of what was happening—one dataset at a time.
Within these digital notebooks, rows of infection rates, recovery statistics, vaccine rollouts, and economic indicators were not just numbers—they were stories. Stories of resilience, inequity, progress, and loss. Through real-time visualization, correlation analysis, and machine learning models, patterns began to surface—revealing hotspots, predicting trends, and informing policy decisions that saved lives.
Jupyter’s open ecosystem allowed collaboration across borders, bringing together minds from different disciplines in a shared effort to decode a crisis. It turned raw data into insight, silence into understanding.In a world that demanded rapid answers, Jupyter Notebook became a living journal of truth—transparent, reproducible, and transformative.
Data didn’t just help us track the virus; it helped us understand its ripple effects—on mental health, education, global trade, and social dynamics.And at the heart of this transformation stood a simple interface: a notebook that bridged the gap between chaos and clarity.
- Developed by Milton
- Data sourced from Our World in Data
You are Freely Invited to Suggest any Improvements. Thank you