Skip to content

This project focuses on data exploration and analytics, including segmentation, ranking, trend analysis, and performance evaluation to uncover business insights.

License

Notifications You must be signed in to change notification settings

k178412/sql-exploratory-data-analysis-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL EXPLORATORY DATA ANALYSIS PROJECT


📊 Project Overview

This project focuses on performing Exploratory Data Analysis (EDA) using SQL Server. The goal is to analyze structured data from the previously built data warehouse project and extract meaningful business insights.

The analysis explores different angles such as date trends, customer performance, product sales, cumulative patterns, segmentation, and more.

Two detailed reports Customers Report and Products Report highlight key findings related to customer behaviour and product performance.


🚀 Setup & Usage

Start by launching SQL Server Management Studio (SSMS) and follow these steps to setup the environment before running any analysis queries.

Directory - scripts/01-setup/

  • 01-init-database.sql - Creates the DataWarehouseAnalytics database and Gold schema. Run this first.

  • 02-ddl-gold.sql - Creates tables (dim_customers, dim_products, fact_sales) inside the Gold schema.

  • 03-load-gold.sql - Loads data into the tables using Bulk Insert.

    ⚠️ Make sure to update the file paths of source files in the script to match your local system.

Once setup is complete, you can start running analysis queries from the 02-eda-queries folder.


🔍 Key Analytical Areas

Various types of analysis were performed throughout the project. Each analytical area focuses on a specific aspect of the data whether it's identifying top performers, uncovering time-based trends, or segmenting customers and products for deeper insights.

Analyses include:

  • Database Exploration - Explored the structure of the database, including schema, tables, and columns using Information Schema.

  • Dimension Exploration - Explored different dimensions within the dataset such as different countries customer come from, available product categories.

  • Date Range Exploration - Explored date boundaries in the dataset such as finding first and last order date to see how many year of sales are available.

    Query:

    select 
        min(order_date) as first_order, 
        max(order_date) as last_order,
        datediff(year, min(order_date), max(order_date)) as order_range_year
    from gold.fact_sales;

    Result:

    First Order Last Order Order Range (Years)
    2010-12-29 2014-01-28 4
  • Measure Exploration - Calculated key metrics such as total sales, total orders, average price, customers who placed an order to support deeper analysis.

  • Magnitude Analysis - Analyzed key metrics grouped by different dimensions such as total products by category, total revenue generated by each customer.

  • Ranking Analysis - Identified top performing products by revenue, helping highlight the ones that bring in most of the sales.

  • Change Over Time - Calculated annual sales to observe how sales changed year over year to track business growth.

  • Cumulative Analysis - Calculated running totals for sales to visualize growth over time and to understand the accumulation of revenue across the timeline.

    Result:

    Year Total Sales Running Total Sales
    2010 43,419 43,419
    2011 7,075,088 7,118,507
    2012 5,842,231 12,960,738
    2013 16,344,878 29,305,616
    2014 45,642 29,351,258
  • Performance Analysis - Compared yearly total sales for each subcategories against average sales and previous sales to evaluate individual performance.

  • Part-to-Whole Analysis - Analyzed how different product categories contribute to total sales to identify the most impactful segment.

  • Data Segmentation – Grouped customers based on order history & spending amount and products into cost ranges, enabling more targeted insights.

    --Segment products into cost ranges and count how many products fall in each segment.
    with cost_segments_cte as(
    select
    	product_name,
    	cost,
    	case when cost < 100 then 'below 100'
           when cost between 100 and 500 then '100-500'
    	     when cost between 501 and 1000 then '501-1000'
    	     when cost between 1001 and 2000 then '1001-2000'
    	     else 'above 2001'
    	end as cost_segments
    from gold.dim_products
    )
    select
    	cost_segments,
    	count(*) as prdcts_in_sgmnt
    from cost_segments_cte
    group by cost_segments
    order by prdcts_in_sgmnt desc;

📜 Reports

The project is summarized into two main reports: one focusing on customer behavior, and the other on product performance. These reports bring together key metrics and KPIs, and can be further be used for analytics or querying purpose.

Includes:

  • Basic customer details (name, age)
  • Customer segmentation (VIP, Regular, New)
  • Transactional details (total orders, total sales)
  • Key KPIs (time since last purchase, average order value)

Includes:

  • Basic product details (product name, category)
  • Product segmentation (High, Mid, Low-Performer)
  • Metrics (total sales, total quantity sold)
  • Key KPIs (months since last sale, average order revenue)

🧠 Key Findings

The analyses helped uncovering several insights, including:

  • Older Customers Drive the Most Orders
    Customers aged 50 and above accounted for a majority of total orders (66%), while younger customers (30–39) contributed only 1%.

    View Query:
     
    with cte as(
    select
      age_group,
      sum(total_orders) as total_orders,
      sum(sum(total_orders)) over() as overall_orders
    from report_customers
    group by age_group
    )
    select
      *,
      cast((cast(total_orders as decimal)/overall_orders) * 100 as decimal(10,1)) as percent_of_overall_orders
    from cte
    order by percent_of_overall_orders desc;
    
    Age Group Total Orders Overall Orders Percent of Overall Orders
    Above 50 18,245 27,659 66.0%
    40–49 9,132 27,659 33.0%
    30–39 282 27,659 1.0%
  • Low-Value Orders Dominate
    Orders below $500 accounted for nearly 75% of all transactions, revealing a strong concentration of low-ticket purchases.

    View Query:
    
      with cte1 as(
      select
      	 case when sales_amount > 2000 then 'above 2,000'
      	      when sales_amount between 1000 and 1999 then 'between 1000 and 1999'
      	      when sales_amount between 500 and 999 then 'between 500 and 999'
      	      when sales_amount < 500 then 'less than 500'
      	 end as sales_remark
      from gold.fact_sales
      ),
      cte2 as(
      select
      	sales_remark,
      	count(*) as total_count,
      	sum(count(*)) over() as overall_count
      from cte1
      group by sales_remark
      )
      select
      	sales_remark,
      	total_count,
      	overall_count,
      	cast(cast(total_count as decimal)/overall_count * 100 as decimal(10,1)) as percet_of_overall
      from cte2
      order by total_count desc;
    
  • Average Sale Value Consistent Across Customer Group
    Average sales ranged from $452 to $529 across all gender and marital status combinations, showing similar spending patterns.

  • Top Sales Contributors: US and Australia
    Over 62% of total sales came from the United States (31.5%) and Australia (31.1%), making them the leading markets.

  • 2013 Was the Peak Sales Year
    The year 2013 recorded the highest sales, making it the business’s best-performing year in the dataset.

  • Bikes Category Dominated Sales Across Years
    From 2010 to 2013, Bikes remained the top-selling category each year, significantly outperforming Accessories and Clothing in total revenue.

🔍 Click here to view all key-findings SQL queries.


🛠️ Tools Used

  • SQL Server - For querying and analysis
  • Data Warehouse (previous project) - Structured dataset foundation
  • GitHub - For version control and project tracking

🔒 License

This project is licensed under MIT License.


🤝 Contributing

If you have ideas for improving the analysis or adding new insights, feel free to share them.


⭐️ If you find this project useful, please consider giving it a star!

About

This project focuses on data exploration and analytics, including segmentation, ranking, trend analysis, and performance evaluation to uncover business insights.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages