This project demonstrates how I used SQL to clean and prepare a dataset on world layoffs for further analysis.
The dataset includes company details, industry, location, funding, and layoff counts.
The goal was to transform raw, messy data into a clean and reliable format.
- Identify and remove duplicate records
- Standardize inconsistent values (company, industry, country)
- Convert dates into the correct format
- Handle missing or blank values
- Deliver a cleaned dataset ready for analysis or visualization
- Created a staging table to protect raw data.
- Applied
ROW_NUMBER()
withPARTITION BY
across key fields to detect duplicates. - Deleted duplicate rows.
- Trimmed whitespace from company names.
- Grouped similar industry names (e.g., all variations of Crypto →
Crypto
). - Fixed country names (e.g., removed trailing periods in
United States.
). - Converted
date
from text (MM/DD/YYYY
) into properDATE
.
- Replaced blank industry values with
NULL
. - Filled missing industry values using self-joins on company and location.
- Removed rows with no layoff information (
total_laid_off
andpercentage_laid_off
both missing).
- Dropped helper columns such as
row_number
. - Ensured only clean, consistent fields remain.
- Window functions (
ROW_NUMBER()
withOVER
) - Common Table Expressions (CTEs)
- String functions (
TRIM()
,TRAILING
) - Date functions (
STR_TO_DATE()
) - Joins and updates for data filling
- Conditional deletes
layoffs_cleaning.sql
→ Main SQL script with all cleaning stepsREADME.md
→ Project documentation
- Database: MySQL
- Editor: MySQL Workbench
With the dataset cleaned, the next phase would be:
- Performing exploratory data analysis (EDA) to uncover layoff trends
- Building visual dashboards to present insights