This project focuses on segmenting e-commerce customers using unsupervised machine learning techniques, with a strong emphasis on clustering algorithms. By analyzing behavioral and transactional data, the goal is to uncover meaningful customer groups and actionable business insights.
In addition to modeling, SQL-based exploratory data analysis was performed to better understand the dataset and guide model selection. This project strengthened my skills in designing, evaluating, and operationalizing unsupervised learning models for real-world customer analytics applications.
- Unsupervised Learning: Understanding and applying clustering algorithms (e.g., K-Means, DBSCAN, Hierarchical Clustering) to segment customers based on their behavior.
- SQL for Data Analysis: Using SQL to perform exploratory data analysis (EDA) and extract meaningful insights from large datasets, which is essential for working with big data platforms like Google BigQuery or AWS Athena.
- Model Evaluation: Evaluating the performance of clustering models through internal validation metrics such as silhouette score and external evaluation techniques like cross-validation or cluster stability tests.
- Feature Engineering: Preparing and transforming data to improve clustering performance, including scaling, normalizing, and selecting key features that best represent customer behavior.
- Customer Profiling: Analyzing clusters to create actionable customer profiles and personas based on purchasing habits, demographics, and engagement.
- Model Maintenance: Understanding how to deploy and monitor unsupervised models over time to ensure their relevance and effectiveness as new data comes in.
- Python: Data manipulation, clustering model implementation, and evaluation.
- SQL: Writing queries for exploratory data analysis and data extraction from large databases.
- Scikit-learn: Implementing and evaluating clustering algorithms.
- Pandas & NumPy: Data wrangling and manipulation for feature engineering.
- Matplotlib / Seaborn: Visualizing customer segments and model performance.