This project focuses on performing Exploratory Data Analysis (EDA) using SQL Server. The goal is to analyze structured data from the previously built data warehouse project and extract meaningful business insights.
The analysis explores different angles such as date trends, customer performance, product sales, cumulative patterns, segmentation, and more.
Two detailed reports Customers Report and Products Report highlight key findings related to customer behaviour and product performance.
Start by launching SQL Server Management Studio (SSMS) and follow these steps to setup the environment before running any analysis queries.
Directory - scripts/01-setup/
-
01-init-database.sql - Creates the
DataWarehouseAnalyticsdatabase andGoldschema. Run this first. -
02-ddl-gold.sql - Creates tables (
dim_customers,dim_products,fact_sales) inside theGoldschema. -
03-load-gold.sql - Loads data into the tables using
Bulk Insert.⚠️ Make sure to update the file paths of source files in the script to match your local system.
Once setup is complete, you can start running analysis queries from the 02-eda-queries folder.
Various types of analysis were performed throughout the project. Each analytical area focuses on a specific aspect of the data whether it's identifying top performers, uncovering time-based trends, or segmenting customers and products for deeper insights.
Analyses include:
-
Database Exploration - Explored the structure of the database, including schema, tables, and columns using
Information Schema. -
Dimension Exploration - Explored different dimensions within the dataset such as different countries customer come from, available product categories.
-
Date Range Exploration - Explored date boundaries in the dataset such as finding first and last order date to see how many year of sales are available.
Query:
select min(order_date) as first_order, max(order_date) as last_order, datediff(year, min(order_date), max(order_date)) as order_range_year from gold.fact_sales;
Result:
First Order Last Order Order Range (Years) 2010-12-29 2014-01-28 4 -
Measure Exploration - Calculated key metrics such as total sales, total orders, average price, customers who placed an order to support deeper analysis.
-
Magnitude Analysis - Analyzed key metrics grouped by different dimensions such as total products by category, total revenue generated by each customer.
-
Ranking Analysis - Identified top performing products by revenue, helping highlight the ones that bring in most of the sales.
-
Change Over Time - Calculated annual sales to observe how sales changed year over year to track business growth.
-
Cumulative Analysis - Calculated running totals for sales to visualize growth over time and to understand the accumulation of revenue across the timeline.
Result:
Year Total Sales Running Total Sales 2010 43,419 43,419 2011 7,075,088 7,118,507 2012 5,842,231 12,960,738 2013 16,344,878 29,305,616 2014 45,642 29,351,258 -
Performance Analysis - Compared yearly total sales for each subcategories against average sales and previous sales to evaluate individual performance.
-
Part-to-Whole Analysis - Analyzed how different product categories contribute to total sales to identify the most impactful segment.
-
Data Segmentation – Grouped customers based on order history & spending amount and products into cost ranges, enabling more targeted insights.
--Segment products into cost ranges and count how many products fall in each segment. with cost_segments_cte as( select product_name, cost, case when cost < 100 then 'below 100' when cost between 100 and 500 then '100-500' when cost between 501 and 1000 then '501-1000' when cost between 1001 and 2000 then '1001-2000' else 'above 2001' end as cost_segments from gold.dim_products ) select cost_segments, count(*) as prdcts_in_sgmnt from cost_segments_cte group by cost_segments order by prdcts_in_sgmnt desc;
The project is summarized into two main reports: one focusing on customer behavior, and the other on product performance. These reports bring together key metrics and KPIs, and can be further be used for analytics or querying purpose.
1️⃣ Customers Report
Includes:
- Basic customer details (name, age)
- Customer segmentation (VIP, Regular, New)
- Transactional details (total orders, total sales)
- Key KPIs (time since last purchase, average order value)
2️⃣ Products Report
Includes:
- Basic product details (product name, category)
- Product segmentation (High, Mid, Low-Performer)
- Metrics (total sales, total quantity sold)
- Key KPIs (months since last sale, average order revenue)
The analyses helped uncovering several insights, including:
-
Older Customers Drive the Most Orders
Customers aged 50 and above accounted for a majority of total orders (66%), while younger customers (30–39) contributed only 1%.View Query:
with cte as( select age_group, sum(total_orders) as total_orders, sum(sum(total_orders)) over() as overall_orders from report_customers group by age_group ) select *, cast((cast(total_orders as decimal)/overall_orders) * 100 as decimal(10,1)) as percent_of_overall_orders from cte order by percent_of_overall_orders desc;Age Group Total Orders Overall Orders Percent of Overall Orders Above 50 18,245 27,659 66.0% 40–49 9,132 27,659 33.0% 30–39 282 27,659 1.0% -
Low-Value Orders Dominate
Orders below $500 accounted for nearly 75% of all transactions, revealing a strong concentration of low-ticket purchases.View Query:
with cte1 as( select case when sales_amount > 2000 then 'above 2,000' when sales_amount between 1000 and 1999 then 'between 1000 and 1999' when sales_amount between 500 and 999 then 'between 500 and 999' when sales_amount < 500 then 'less than 500' end as sales_remark from gold.fact_sales ), cte2 as( select sales_remark, count(*) as total_count, sum(count(*)) over() as overall_count from cte1 group by sales_remark ) select sales_remark, total_count, overall_count, cast(cast(total_count as decimal)/overall_count * 100 as decimal(10,1)) as percet_of_overall from cte2 order by total_count desc; -
Average Sale Value Consistent Across Customer Group
Average sales ranged from $452 to $529 across all gender and marital status combinations, showing similar spending patterns. -
Top Sales Contributors: US and Australia
Over 62% of total sales came from the United States (31.5%) and Australia (31.1%), making them the leading markets. -
2013 Was the Peak Sales Year
The year 2013 recorded the highest sales, making it the business’s best-performing year in the dataset. -
Bikes Category Dominated Sales Across Years
From 2010 to 2013, Bikes remained the top-selling category each year, significantly outperforming Accessories and Clothing in total revenue.
🔍 Click here to view all key-findings SQL queries.
- SQL Server - For querying and analysis
- Data Warehouse (previous project) - Structured dataset foundation
- GitHub - For version control and project tracking
This project is licensed under MIT License.
If you have ideas for improving the analysis or adding new insights, feel free to share them.
⭐️ If you find this project useful, please consider giving it a star!