- Project Overview
- Dataset Information
- Data Cleaning Process
- Exploratory Data Analysis (EDA)
- Key Insights
- Usage
- Example Queries
- Repository Structure
- Visualizations
- Future Improvements
- Contributing
- License
- Author & Contact
This project focuses on data cleaning and exploratory data analysis (EDA) of a layoffs dataset covering global companies between March 2020 – March 2023. The raw Excel dataset was imported into MySQL, where all cleaning, transformation, and analysis were executed using SQL queries.
- Raw file:
raw_layoffs.xlsx - Imported table:
layoffs_staging2 - Period covered: 2020-03-11 → 2023-03-06
Columns:
| Column | Description |
|---|---|
company |
Name of the company |
country |
Country of operation |
date |
Layoff announcement date |
sector |
Industry or business sector |
total_laid_off |
Employees laid off |
percentage_laid_off |
Workforce percentage affected |
location |
Company location |
stage |
Company stage (Startup, Series C, Public, etc.) |
funds_raised_millions |
Total funds raised in USD millions |
Key cleaning operations performed:
- Removed null, duplicate, and inconsistent records.
- Standardized date format (
YYYY-MM-DD). - Trimmed whitespaces and fixed casing for text columns.
- Replaced blank
percentage_laid_offwithNULL. - Converted
funds_raised_millionsto numeric data type. - Created a cleaned table for analysis.
Executed directly in MySQL using GROUP BY, JOIN, WINDOW FUNCTIONS, and CTEs to explore:
- Monthly and yearly layoff trends
- Top companies with highest layoffs
- Layoffs by country, sector, and stage
- Correlation between funds raised and layoffs
- Average layoff percentage per sector
- Peak layoffs occurred in mid-2020 and early 2023.
- Tech and consumer-services sectors were hit hardest.
- Startups in late-funding stages had higher layoff rates.
- Some highly funded firms still made massive cuts — funding ≠ stability.
- MySQL 8.0+
- Any SQL client (MySQL Workbench / CLI)
- Raw dataset:
raw_layoffs.xlsx
-
Clone this repository
git clone https://github.com/Aadityavarier/Data-Cleaning-and-EDA.git
-
Convert Excel file to CSV for MySQL import (if needed)
- Open
raw_layoffs.xlsxand save/export asraw_layoffs.csv.
- Open
-
Import the raw dataset into MySQL
LOAD DATA INFILE 'path_to/raw_layoffs.csv' INTO TABLE layoffs_staging2 FIELDS TERMINATED BY ',' IGNORE 1 ROWS;
-
Run the SQL script
mysql -u root -p < "data_cleaning_and_eda_project.sql"
-
Explore outputs and insights using SELECT statements.
-- 1. Top 5 companies with highest layoffs
SELECT company, SUM(total_laid_off) AS total
FROM layoffs_staging2
GROUP BY company
ORDER BY total DESC
LIMIT 5;
-- 2. Monthly layoffs trend
SELECT DATE_FORMAT(date, '%Y-%m') AS month,
SUM(total_laid_off) AS total
FROM layoffs_staging2
GROUP BY month
ORDER BY month;📁 data-cleaning-and-eda-project/
│
├── 📄 raw_layoffs.xlsx # Original dataset (Excel)
├── 📄 data_cleaning_and_eda_project.sql # MySQL cleaning + EDA queries
├── 📄 LICENSE # MIT License
└── 📄 README.md # Project documentation
**Below: Chart generated from query outputs. **
To see more go to (Visuals)
Contributions and suggestions are welcome!
- Fork the repo.
- Create a feature branch.
- Submit a pull request.
This project is licensed under the MIT License — see the LICENSE file for details.
Aaditya V
B.E. in AI & Data Science (2nd Year) — Mumbai University
- Email: aadityav1703@gmail.com
- LinkedIn: aadityavarier
- GitHub: Aadityavarier



