This project performs customer segmentation on wholesale spending data using two clustering algorithms:
- KMeans
- DBSCAN
The objective is to analyze customer purchasing behavior across multiple product categories and compare clustering approaches.
The dataset contains annual spending for wholesale customers across six product categories:
- Fresh
- Milk
- Grocery
- Frozen
- Detergents_Paper
- Delicassen
Total records: 440 customers
- Exploratory Data Analysis (EDA)
- Skewness detection and log transformation
- Feature standardization
- KMeans clustering
- DBSCAN clustering
- PCA visualization
- Cluster profiling
- Algorithm comparison
- KMeans produced stable clusters with moderate separation.
- DBSCAN detected density-based groups and identified noise points.
- Silhouette scores indicate overlapping but meaningful customer segments.
- Cluster 0: Grocery & Detergent-heavy customers
- Cluster 1: Fresh-product-focused customers
- Cluster 2: High-volume diversified buyers
- Python
- Pandas
- NumPy
- Seaborn
- Matplotlib
- Scikit-learn
pip install -r requirements.txt