Skip to content

A complete SQL-based project analyzing global company layoffs between 2020 and 2023. Includes data cleaning, duplication handling, date standardization, and exploratory analysis to uncover key trends by country, sector, and year — all done using MySQL.

License

Notifications You must be signed in to change notification settings

Aadityavarier/Data-Cleaning-and-EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧹 Data Cleaning & Exploratory Data Analysis (EDA) on Global Layoffs Dataset

License: MIT Made with MySQL GitHub last commit GitHub stars


📑 Table of Contents


📘 Project Overview

This project focuses on data cleaning and exploratory data analysis (EDA) of a layoffs dataset covering global companies between March 2020 – March 2023. The raw Excel dataset was imported into MySQL, where all cleaning, transformation, and analysis were executed using SQL queries.


🗂️ Dataset Information

  • Raw file: raw_layoffs.xlsx
  • Imported table: layoffs_staging2
  • Period covered: 2020-03-11 → 2023-03-06

Columns:

Column Description
company Name of the company
country Country of operation
date Layoff announcement date
sector Industry or business sector
total_laid_off Employees laid off
percentage_laid_off Workforce percentage affected
location Company location
stage Company stage (Startup, Series C, Public, etc.)
funds_raised_millions Total funds raised in USD millions

🧽 Data Cleaning Process

Key cleaning operations performed:

  1. Removed null, duplicate, and inconsistent records.
  2. Standardized date format (YYYY-MM-DD).
  3. Trimmed whitespaces and fixed casing for text columns.
  4. Replaced blank percentage_laid_off with NULL.
  5. Converted funds_raised_millions to numeric data type.
  6. Created a cleaned table for analysis.

🔍 Exploratory Data Analysis (EDA)

Executed directly in MySQL using GROUP BY, JOIN, WINDOW FUNCTIONS, and CTEs to explore:

  • Monthly and yearly layoff trends
  • Top companies with highest layoffs
  • Layoffs by country, sector, and stage
  • Correlation between funds raised and layoffs
  • Average layoff percentage per sector

📈 Key Insights

  • Peak layoffs occurred in mid-2020 and early 2023.
  • Tech and consumer-services sectors were hit hardest.
  • Startups in late-funding stages had higher layoff rates.
  • Some highly funded firms still made massive cuts — funding ≠ stability.

💻 Usage

🔧 Requirements

  • MySQL 8.0+
  • Any SQL client (MySQL Workbench / CLI)
  • Raw dataset: raw_layoffs.xlsx

▶️ Steps to Run

  1. Clone this repository

    git clone https://github.com/Aadityavarier/Data-Cleaning-and-EDA.git
  2. Convert Excel file to CSV for MySQL import (if needed)

    • Open raw_layoffs.xlsx and save/export as raw_layoffs.csv.
  3. Import the raw dataset into MySQL

    LOAD DATA INFILE 'path_to/raw_layoffs.csv'
    INTO TABLE layoffs_staging2
    FIELDS TERMINATED BY ',' 
    IGNORE 1 ROWS;
  4. Run the SQL script

    mysql -u root -p < "data_cleaning_and_eda_project.sql"
  5. Explore outputs and insights using SELECT statements.


🧾 Example Queries

-- 1. Top 5 companies with highest layoffs
SELECT company, SUM(total_laid_off) AS total
FROM layoffs_staging2
GROUP BY company
ORDER BY total DESC
LIMIT 5;

-- 2. Monthly layoffs trend
SELECT DATE_FORMAT(date, '%Y-%m') AS month,
       SUM(total_laid_off) AS total
FROM layoffs_staging2
GROUP BY month
ORDER BY month;

🗂️ Repository Structure

📁 data-cleaning-and-eda-project/
│
├── 📄 raw_layoffs.xlsx               # Original dataset (Excel)
├── 📄 data_cleaning_and_eda_project.sql  # MySQL cleaning + EDA queries
├── 📄 LICENSE                        # MIT License
└── 📄 README.md                      # Project documentation

📊 Visualizations

**Below: Chart generated from query outputs. **

Countrywise_Layoffs

Companywise_Layoffs

Yearly_Layoffs

Monthly_Layoffs

To see more go to (Visuals)


🤝 Contributing

Contributions and suggestions are welcome!

  1. Fork the repo.
  2. Create a feature branch.
  3. Submit a pull request.

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.


👤 Author & Contact

Aaditya V
B.E. in AI & Data Science (2nd Year) — Mumbai University

About

A complete SQL-based project analyzing global company layoffs between 2020 and 2023. Includes data cleaning, duplication handling, date standardization, and exploratory analysis to uncover key trends by country, sector, and year — all done using MySQL.

Topics

Resources

License

Stars

Watchers

Forks