Skip to content

Clustering algorithms to segment clients of a distribution company

Notifications You must be signed in to change notification settings

imane-ayouni/Customer-Segmentation

Repository files navigation

Customer-Segmentation

Topic :

Olist is a Brazilian department store platform which operates in the e-commerce segment (Software as Service). The service consists of management of the sales process between shopkeepers and clients, and also includes a customer satisfaction report. The advantages for the shopkeepers is a better market presence and transparent reputation metrics. The data provided by Olist contains 9 datasets which contain the following information:

1- Orders : contains info about the order is, status and timestamps of the process of its delivery.

2- Order items: contains orders ids, SKU (Stock Keeping Unit), the seller, price and shipping expense

3- Products : contains technical information about the products (dimensions and weight)

4- Order payments : contains information about payment type, installements and purchase value

5- Order reviews : contains information like review id and score

6- Sellers : contains information about the sellers location like zip code, city and state

7- Customers: gives us information about the customers location: zip code, state and city

8- Geolocation: gives us detailed information about the location of the places where the commerce occured (both customers and sellers)

9- Product category name translation : contains the English translation of some of the products sold on the plateform

The links between these datasets can be represented as follows:

image

Market Segmentation

For the business development process in general, and for supply chain specifically, an understanding of customer behavior and geographic conditions is a useful method to make better decisions. By extracting commonly shared demographic- and geodemographic characteristics clusters (or segments) can be defined. This allows to apply tailor-made strategies to target customers and optimize supply chain more effectively.

Objectives

  • Gather relevant information from the datasets
  • Visually explore the dataset to understand more about the business and its trends
  • Build clustering models to be able to best segment the customers of the company

Steps followed

  • Part I : Preliminary processing and merging datasets
  • Part II : Feature engineering and exploratory data analysis
  • Part III : Customer clustering
  • Power BI dashboards

Models Used

  • Kmeans (centroid based clustering)
  • DBSCAN (density based clustering)
  • Gaussian Mixture (distribution based clustering)

Evaluation matrices used

  • Silhouette plot and score
  • Elbow plot
  • CH index
  • DB index

Data source

https://www.kaggle.com/olistbr/brazilian-ecommerce