Skip to content

An end-to-end data analysis project showcasing ETL, cleaning, and visualization techniques using Excel and Power Query.

License

Notifications You must be signed in to change notification settings

PerceptronCipher/Excel-Sales-Analysis-and-ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project Title: End-to-End Sales Analysis using Excel Power Query

  1. Project Overview This project demonstrates an end-to-end data analysis pipeline focused on a sales dataset. The entire process, including data extraction, transformation, loading (ETL), and data cleaning, was conducted using Microsoft Excel's Power Query. The goal was to transform raw, messy data into a clean, structured dataset ready for business intelligence and strategic decision-making.

  2. The Challenge: Unlocking Insights from Messy Data The initial datasets were in a raw format, containing:

Inconsistent data types

Missing values (NA and null)

Redundant and irrelevant columns

Disparate tables requiring integration

The objective was to systematically clean and prepare this data to enable accurate and meaningful sales analysis.

  1. Methodology: A Step-by-Step ETL Process The entire ETL process was performed within the Power Query Editor.

A. Data Import

Source: The data was imported from multiple Excel files (or a single workbook with multiple sheets) into Power Query.

B. Data Transformation & Cleaning

FactInternetSales:

New columns like Total Revenue and COGS were engineered to calculate key profitability metrics.

Product:

Columns with a high percentage of nulls were removed.

NA values in the Color column were replaced with 'Unspecified' to retain data integrity.

Date:

The table was filtered to analyze data only from 2005 to 2008.

New columns like Year, Month, Quarter, and a conditional WeekType were created to enable time-based analysis.

Customer:

First and last names were merged into a single CustomerName column for simplified analysis.

The table was cleaned by handling nulls and selecting only important columns.

Geography:

The table was streamlined by removing columns of low importance and retaining key geographical details.

  1. Tools Used Microsoft Excel: Used for all data manipulation, analysis, and visualization.

Power Query: The primary tool for the entire ETL process.

  1. Future Work Predictive Analysis: Explore the use of machine learning algorithms to identify key features contributing to sales performance. This would involve a transition to tools like Python and scikit-learn.

About

An end-to-end data analysis project showcasing ETL, cleaning, and visualization techniques using Excel and Power Query.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published