This project demonstrates a complete end-to-end workflow for customer segmentation using RFM (Recency, Frequency, Monetary) analysis and unsupervised machine learning techniques.
- File:
orders.parquet - Columns:
id: Unique order IDcreated_at: Timestamp of the ordersales_amount: Order value in MYRcustomer_id: Unique customer identifier
- Perform data cleaning and quality checks
- Conduct exploratory data analysis (EDA)
- Engineer RFM features at customer level
- Cluster customers with unsupervised learning (MiniBatch KMeans)
- Generate business insights and strategies per segment
pandas,numpymatplotlib,seaborn,plotlyscikit-learn(MiniBatchKMeans, silhouette_score)datetime,warnings,resample
- Loaded data from
.parquetformat - Checked for missing values and data types
- Parsed
created_atto datetime and sorted records
- Visualized order volume by:
- Quarter (
Q1–Q4) - Month (line plot)
- Week
- Day of Week (Mon–Sun)
- Hour of Day (peak hours)
- Quarter (
- Grouped transactions by
customer_id:- Recency: Days since last purchase
- Frequency: Number of orders
- Monetary: Total spending
- Normalized features for clustering
- Used MiniBatchKMeans for efficient clustering
- Determined best number of clusters using:
- Elbow Method (SSE)
- Silhouette Score
- Labeled clusters as:
ChampionsLoyal CustomersNormal Customers
- Analyzed RFM distribution per cluster
- Created segment-level visualizations and summaries
- Proposed business strategies per segment
# Install necessary packages
pip install pandas matplotlib seaborn scikit-learn plotly
# Launch Jupyter
jupyter notebook test.ipynb