Skip to content

TheMrityunjayPathak/Supermarket-Sales-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Supermarket Sales Analysis

Grocery Stores are a vital part of everyday life, providing us with the food and essentials as we need.

People often uses grocery delivery applications to order their products, making it easy to shop from home.

Each transaction made through these applications is recorded in detail creating a valuable dataset.

This project looks at a same data of a supermarket transactions to understand how well it is performing.

Dataset

The dataset is sourced from Kaggle which simulates grocery sales activities within Tamil Nadu state of India.

The dataset includes various columns that provide detailed information about each transaction at the Supermarket.

Link to the Dataset : Supermarket Sales Dataset

Problem Statement

  • To analyze supermarket sales data, identifying key factors for improving profitability and operational efficiency.

  • This Exploratory Data Analysis (EDA) aims to address the following key questions :

    • Customer Behavior Analysis : What are the purchasing patterns of customers based on different categories and sub-categories? How does customer spending vary across cities and states?

    • Sales Trends : Are there observable trends in sales over time? How do sales figures fluctuate across different months or seasons?

    • Discount Impact : What is the relationship between discounts and sales? How do discounts influence the profit margins across different categories and regions?

    • Profit Analysis : What are the profit margins associated with various product categories and sub-categories? How do these margins vary by city and state?

    • Regional Performance : How do sales and profit performance differ across different regions and states? Are there specific regions that contribute more significantly to overall sales and profits?

    • Category Insights : What are the most and least popular product categories and sub-categories? How does the popularity of these categories vary by location and over time?

  • This analysis will provide a deeper understanding of supermarket sales dynamics revealing trends and patterns that can inform inventory management, promotional strategies and regional marketing efforts.

Table of Contents

Setting up the Enviroment

Jupyter Notebook is required for this project and you can install and set it up in the terminal.

  • Install the Notebook
pip install notebook
  • Run the Notebook
jupyter notebook

Libraries required for the Project

NumPy

  • Go to the terminal and run this code
pip install numpy

Pandas

  • Go to the terminal and run this code
pip install pandas

Matplotlib

  • Go to the terminal and run this code
pip install matplotlib

Seaborn

  • Go to the terminal and run this code
pip install seaborn

Getting Started

  • Clone this repository to your local machine by using the following command :
git clone https://github.com/TheMrityunjayPathak/Supermarket-Sales-Analysis.git

Steps involved in the Project

Importing Libraries

  • Importing necessary libraries like numpy, pandas, matplotlib and seaborn.

Reading CSV File

  • Reading CSV file by using pd.read_csv() method.

Overview of the Dataset

  • Information about shape and size of the dataset.

  • Types of column present in the dataset (numerical, categorical, text).

  • Detailed info about the dataset using df.info() method.

Handling Null values in the Dataset

  • This dataset does not contain any null values.

Unique values in aach Categorical Column

  • Unique customer names in the data.

  • Unique product categories in the data.

  • Unique product sub-categories in the data.

  • Unique cities in the data.

  • Unique regions in the data.

Changing DataType of Columns

  • Modifying the datatype of order_date column to pandas datetime format.

Utilizing existing information to create new Columns

  • Extracting year, month and dates from order_date column.

  • Extracting discount_amount from discount percent column by using mathematical formulas.

Statistical Analysis

  • No. of products sold in each category.

  • No. of products sold in each sub category.

  • No. of products sold in each city.

  • No. of products sold in each region.

  • No. of products sold each year, month and date.

Data Visualization

  • No. of products sold in each category.

download

  • No. of products sold in each sub category.

download

  • No. of products sold in each city.

download

  • No. of products sold in each region.

download

  • No. of products sold each year.

download

  • No. of products sold each month.

download

  • No. of products sold each date.

download

  • Total sales in each category.

download

  • Total sales in each sub category.

download

  • Total sales in each region.

download

  • Total sales in each city.

download

  • Total sales in each month.

download

  • Total sales in each year.

download

  • Total profit in each category.

download

  • Total profit in each sub category.

download

  • Total profit in each region.

download

  • Total profit in each city.

download

  • Total profit in each month.

download

  • Total profit in each year.

download

  • Customers with highest amount of total sales.

download

  • Customers with highest profit on their purchase.

download

  • Total discount availed by customers.

download

Conclusion

Here are some key findings about the analysis :

  • Analyzed purchasing pattern of 9000+ Customers of Supermarket.

  • More than 15% of the products sold were Snacks.

  • More than 32% of the sales were occurred in West Region of the Supermarket.

  • Health Drinks and Soft Drinks are the most profitable category in Beverages.

  • November was the most profitable month contributing about 15% of the Total Annual Profits.