GitHub - mrunmayee3108/Data-Cleaning-and-Exploratory-data-analysis-using-MySQL: Data cleaning and exploratory data analysis

Data Cleaning and Exploratory Analysis with MySQL

This project demonstrates how to clean and explore a raw dataset using SQL. It focuses on fixing common data quality issues and performing summary analyses to prepare the data for further insights.

Overview:

The workflow includes:

Data Cleaning:

Remove duplicates
Handle null and missing values
Standardize date formats
Group ages into categories
Drop irrelevant columns

Exploratory Data Analysis (EDA):

Salary statistics (min, max, average, standard deviation)
Department-level salary summaries
Identify top earners and salary distributions
Find duplicate names, shared salaries, and upcoming birthdays or anniversaries

Features:

Uses SQL window functions (ROW_NUMBER(), NTILE(), RANK()) for deduplication and ranking.
Standardizes inconsistent date formats.
Groups employees into age bands (20–30, 30–40, etc.).
Identifies key insights like highest paid employees and department salary spend.

Example Insights:

Top Earners: Who earns the most in the company.
Department Summary: Total salaries and average salaries by department.
Upcoming Events: Employees with birthdays or anniversaries this month.

Why This Matters: Clean data is the foundation for meaningful analysis. This project shows how SQL alone can resolve common data issues and uncover trends.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
datacleaning&EDA.sql		datacleaning&EDA.sql
datacleaning.csv		datacleaning.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

mrunmayee3108/Data-Cleaning-and-Exploratory-data-analysis-using-MySQL

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages