In this project, I analyzed various customer segments in Online Retail dataset using python. For this task, I employed cohort analysis, RFM Analysis and k-means clustering.
Identify the customer segmengts in the dataset and thereby prescribe course of business acton for each segment.
Example of a segment might be the customers who bring the max profit and visit frequently.
Source: The UCI Machine Learning Repository
This data set contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a non-store online retail.
- Removed Null Values
- Removed duplicated Values
- Maximum transactions are from UK
A cohort is a set of users who share similar characteristics over time. Cohort analysis groups the users into mutually exclusive groups and their behaviour is measured over time.
There are three types of cohort analysis:
- Time cohorts: It groups customers by their purchase behaviour over time.
- Behaviour cohorts: It groups customers by the product or service they signed up for.
- Size cohorts: Refers to various sizes of customers who purchase company's products or services. This categorization can be based on the amount of spending in some period of time.
For this project, I have chosen time cohorts. The steps are as follows:
- Identified cohort month for each customer (The month when customer first transacted)
# First Transaction month (Cohort Month) for each customer
df3['Cohort Month']=df3.groupby('CustomerID')['InvoiceFormat'].transform(min)
- Identified cohort index (difference between transaction month and cohort month) for each transaction.
# This function calculates difference between invoice format and cohort month
def diff(d,x1,y1):
l=[]
for i in range(0,len(d)):
xyear=d[x1][i].year
xmonth=d[x1][i].month
yyear=d[y1][i].year
ymonth=d[y1][i].month
diff=((xyear-yyear)*12)+(xmonth-ymonth)+1
l.append(diff)
return l
- Grouped data by cohort month and cohort index.
- Developed a pivot table.
- Developed a time cohort heatmap
-
We are roughly left with 10% of new joiners after an year of use. Retention thereby is quite poor.
-
Every month we are adding roughly 250 new people. Marketing regarding this aspect is Ok.
RFM is Recency, Frequency, Monetary. It looks at what was the last time a customer transacted, how frequent they transacted and what monetary value they bring to the business as factors to assign score to customers. These scores can further be used to group customers.
- Recency
- The last transaction in the datset was on 2011-12-09. Thus the recency score was calculated taking 2012-01-01 (New Year) as snapshot date.
- Recency score is the difference between snapshot date and last transaction date by each customer. It is reported in no. of days.
- Frequency
freq=df6.groupby(["CustomerID"])[["InvoiceNo"]].count()
- Monetary
df6["total"]=df6["Quantity"]*df6["UnitPrice"]
money=df6.groupby(["CustomerID"])[["total"]].sum()
Before K means clustering, I removed data skewness.
- RFM Distribution
Clearly the data is left skewed. I used log transformation to remove skewness.
- After log transformation
Thus skewnes has been removed. Now we can proceed to implement k-means.
inertia=[]
for i in np.arange(1,11):
kmeans=KMeans(n_clusters=i)
kmeans.fit(scaled)
inertia.append(kmeans.inertia_)
plt.figure(figsize=(12,8))
plt.plot(inertia, marker="o");
From the graph, I chose a cluster size of 3. The cluster statistics are :
I then labeled the customer segments as :
- Major
- At risk
- Average Customers
The relative importance of cluster charactersitics are:
- At Risk Customers : These Customers have transacted a long time ago and contribute least in monetary terms.
- Suggestion : These customers may have already exited from customer base. Try to understand why they left. Some sale and discount offers might help to bring a portion back.
- Average Standing customers: These Customers have transacted a recently and regularly, and contribute appreciably in monetary terms.
- Suggestion : Need to handle them with care and convert them to best customers. Discount and Sale are highly desirable. Provide top customer support and services.
- Best customers : These customer transacted recently, are incredibly frequent and bring massive money to the company.
- Suggestion : These customers can be a target of newly launched product. Repeated advertising can further increase revenue. Heavy discounts are not required.