Customer Segmentation in Online Retail

In this project, I analyzed various customer segments in Online Retail dataset using python. For this task, I employed cohort analysis, RFM Analysis and k-means clustering.

Problem Statement

Identify the customer segmengts in the dataset and thereby prescribe course of business acton for each segment.

Example of a segment might be the customers who bring the max profit and visit frequently.

Data Overview

Source: The UCI Machine Learning Repository

This data set contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a non-store online retail.

Data Snapshot

Data Exploration

Removed Null Values
Removed duplicated Values
Maximum transactions are from UK

Cohort Analysis

A cohort is a set of users who share similar characteristics over time. Cohort analysis groups the users into mutually exclusive groups and their behaviour is measured over time.

There are three types of cohort analysis:

Time cohorts: It groups customers by their purchase behaviour over time.
Behaviour cohorts: It groups customers by the product or service they signed up for.
Size cohorts: Refers to various sizes of customers who purchase company's products or services. This categorization can be based on the amount of spending in some period of time.

For this project, I have chosen time cohorts. The steps are as follows:

Identified cohort month for each customer (The month when customer first transacted)

# First Transaction month (Cohort Month) for each customer
df3['Cohort Month']=df3.groupby('CustomerID')['InvoiceFormat'].transform(min)

Identified cohort index (difference between transaction month and cohort month) for each transaction.

# This function calculates difference between invoice format and cohort month
def diff(d,x1,y1):
    l=[]
    for i in range(0,len(d)):
        xyear=d[x1][i].year
        xmonth=d[x1][i].month
        yyear=d[y1][i].year
        ymonth=d[y1][i].month
        diff=((xyear-yyear)*12)+(xmonth-ymonth)+1
        l.append(diff)
    return l

Grouped data by cohort month and cohort index.
Developed a pivot table.

Developed a time cohort heatmap

Summary

We are roughly left with 10% of new joiners after an year of use. Retention thereby is quite poor.
Every month we are adding roughly 250 new people. Marketing regarding this aspect is Ok.

RFM Analysis

RFM is Recency, Frequency, Monetary. It looks at what was the last time a customer transacted, how frequent they transacted and what monetary value they bring to the business as factors to assign score to customers. These scores can further be used to group customers.

Recency

The last transaction in the datset was on 2011-12-09. Thus the recency score was calculated taking 2012-01-01 (New Year) as snapshot date.
Recency score is the difference between snapshot date and last transaction date by each customer. It is reported in no. of days.

Frequency

freq=df6.groupby(["CustomerID"])[["InvoiceNo"]].count()

Monetary

df6["total"]=df6["Quantity"]*df6["UnitPrice"]
money=df6.groupby(["CustomerID"])[["total"]].sum()

Clustering

Before K means clustering, I removed data skewness.

RFM Distribution

Clearly the data is left skewed. I used log transformation to remove skewness.

After log transformation

Thus skewnes has been removed. Now we can proceed to implement k-means.

Implementing K means

inertia=[]

for i in np.arange(1,11):
    kmeans=KMeans(n_clusters=i)
    kmeans.fit(scaled)
    inertia.append(kmeans.inertia_)

plt.figure(figsize=(12,8))
plt.plot(inertia, marker="o");

Elbow Curve

From the graph, I chose a cluster size of 3. The cluster statistics are :

I then labeled the customer segments as :

Major
At risk
Average Customers

The relative importance of cluster charactersitics are:

Suggestions and Cluster Interpretation:

At Risk Customers : These Customers have transacted a long time ago and contribute least in monetary terms.

Suggestion : These customers may have already exited from customer base. Try to understand why they left. Some sale and discount offers might help to bring a portion back.

Average Standing customers: These Customers have transacted a recently and regularly, and contribute appreciably in monetary terms.

Suggestion : Need to handle them with care and convert them to best customers. Discount and Sale are highly desirable. Provide top customer support and services.

Best customers : These customer transacted recently, are incredibly frequent and bring massive money to the company.

Suggestion : These customers can be a target of newly launched product. Repeated advertising can further increase revenue. Heavy discounts are not required.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Customer Updated.ipynb		Customer Updated.ipynb
Customer_Segmentation.ipynb		Customer_Segmentation.ipynb
Online Retail.xlsx		Online Retail.xlsx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Segmentation in Online Retail

Problem Statement

Data Overview

Data Snapshot

Data Exploration

Cohort Analysis

Summary

RFM Analysis

Clustering

Implementing K means

Suggestions and Cluster Interpretation:

About

Releases

Packages

Languages

SinghShubham19/Customer-Segmentation-in-Online-Retail

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation in Online Retail

Problem Statement

Data Overview

Data Snapshot

Data Exploration

Cohort Analysis

Summary

RFM Analysis

Clustering

Implementing K means

Suggestions and Cluster Interpretation:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages