Skip to content

patelritiq/DS_Assignment_Zeotap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Customer Analytics & Segmentation ๐Ÿ“Š

Python Pandas Scikit-learn License: MIT

A comprehensive data science project for customer analytics, segmentation, and lookalike modeling. Built as an assignment for Zeotap Data Science internship position (January 2025).


Project Statistics ๐Ÿ“ˆ

  • 200 customers analyzed
  • 1,000 transactions processed
  • 5 customer segments identified (K-Means clustering)
  • Davies-Bouldin Index: 1.05 (optimal clustering quality)
  • 3 analysis modules (EDA, Clustering, Lookalike)
  • 20 lookalike recommendations generated
  • 4 key features (Total Spend, Avg Spend, Transaction Count, Avg Quantity)

Project Overview ๐ŸŽฏ

This project demonstrates end-to-end customer analytics capabilities including exploratory data analysis, customer segmentation using machine learning, and lookalike modeling for targeted marketing. Developed as part of the Zeotap Data Scientist internship assessment.

Assignment Context

  • Company: Zeotap (Customer Data Platform)
  • Position: Data Science Internship
  • Date: January 2025
  • Objective: Demonstrate data science skills in customer analytics and segmentation

Key Objectives

  1. Exploratory Data Analysis (EDA): Analyze customer, product, and transaction data
  2. Customer Segmentation: Group customers using K-Means clustering
  3. Lookalike Modeling: Identify similar customers for targeted campaigns

Data Science Skills Demonstrated ๐Ÿ”ฌ

Machine Learning

  • K-Means clustering for customer segmentation
  • Davies-Bouldin Index for cluster quality evaluation
  • PCA (Principal Component Analysis) for dimensionality reduction
  • Cosine similarity for lookalike modeling

Data Analysis

  • Exploratory Data Analysis (EDA)
  • Feature engineering and aggregation
  • Data merging and transformation
  • Statistical analysis and profiling

Data Preprocessing

  • StandardScaler for feature normalization
  • One-hot encoding for categorical variables
  • DateTime parsing and manipulation
  • Missing value handling

Project Structure ๐Ÿ“

DS_Assignment_Zeotap/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ Customers.csv          # Customer profiles (200 records)
โ”‚   โ”œโ”€โ”€ Products.csv           # Product catalog
โ”‚   โ””โ”€โ”€ Transactions.csv       # Transaction history (1,000 records)
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ eda.py                 # Exploratory Data Analysis
โ”‚   โ”œโ”€โ”€ clustering.py          # Customer Segmentation (K-Means)
โ”‚   โ””โ”€โ”€ lookalike.py           # Lookalike Modeling
โ”œโ”€โ”€ reports/
โ”‚   โ”œโ”€โ”€ eda_report.pdf         # EDA insights and visualizations
โ”‚   โ””โ”€โ”€ clustering_report.pdf  # Segmentation analysis and recommendations
โ”œโ”€โ”€ output/
โ”‚   โ””โ”€โ”€ lookalike_results.csv  # Lookalike recommendations
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ LICENSE
โ””โ”€โ”€ README.md

Installation & Setup ๐Ÿ› ๏ธ

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Installation

  1. Clone the repository:

    git clone https://github.com/patelritiq/DS_Assignment_Zeotap.git
    cd DS_Assignment_Zeotap
  2. Install required libraries:

    pip install pandas numpy matplotlib seaborn scikit-learn

Usage ๐Ÿ“–

1. Exploratory Data Analysis (EDA)

Analyze datasets and visualize insights:

cd src
python eda.py

Output:

  • Data summaries and statistics
  • Most purchased products
  • Customer behavior patterns

Report: See reports/eda_report.pdf for detailed insights and business recommendations.


2. Customer Segmentation (Clustering)

Perform K-Means clustering to segment customers:

cd src
python clustering.py

Output:

  • Optimal number of clusters: 5
  • Davies-Bouldin Index: 1.05
  • PCA visualization of customer segments

Report: See reports/clustering_report.pdf for detailed segmentation analysis.


3. Lookalike Modeling

Generate similar customer recommendations:

cd src
python lookalike.py

Output:

  • Lookalike recommendations for 20 customers
  • Results saved to output/lookalike_results.csv

Customer Segmentation Results ๐ŸŽฏ

5 Customer Segments Identified

Cluster Description Characteristics Recommendation
Cluster 0 High-Frequency Buyers Medium spending, active engagement Loyalty programs, personalized offers
Cluster 1 Occasional Buyers Low frequency, low spending Basic incentives, re-engagement campaigns
Cluster 2 High-Value Customers Significant spending, moderate engagement Premium services, upselling strategies
Cluster 3 Inactive Customers Very little activity Reactivation campaigns, personalized offers
Cluster 4 Moderate Spenders Frequent purchases, consistent engagement Increase basket size, bundled products

Clustering Methodology

  • Algorithm: K-Means (chosen for efficiency and interpretability)
  • Features: Total Spend, Avg Spend, Transaction Count, Avg Quantity
  • Optimization: Davies-Bouldin Index (lower = better)
  • Visualization: PCA for 2D representation

Key Findings ๐Ÿ“Š

Dataset Overview

  • Customers: 200 unique customers
  • Transactions: 1,000 transaction records
  • Products: Multiple product categories
  • Regions: Multiple geographic regions

Segmentation Quality

  • Optimal Clusters: 5 segments
  • Davies-Bouldin Index: 1.05 (acceptable quality)
  • Clear Separation: PCA visualization shows distinct clusters

Business Impact

  • Enables targeted marketing campaigns
  • Identifies high-value customer segments
  • Supports personalized customer engagement
  • Facilitates lookalike audience targeting

Future Enhancements ๐Ÿ”ฎ

  • Hierarchical clustering for comparison
  • DBSCAN for density-based segmentation
  • Time-series analysis for customer lifetime value
  • Churn prediction modeling
  • Real-time recommendation system
  • Interactive dashboard (Streamlit/Dash)
  • A/B testing framework
  • Advanced feature engineering (RFM analysis)

Reports & Documentation ๐Ÿ“„

Detailed analysis reports are available in the reports/ folder:

  • EDA Report (eda_report.pdf): Comprehensive exploratory analysis with visualizations and business insights
  • Clustering Report (clustering_report.pdf): Customer segmentation analysis with cluster descriptions and recommendations

Technologies Used ๐Ÿ’ป

  • Python: Core programming language
  • Pandas: Data manipulation and analysis
  • NumPy: Numerical computations
  • Scikit-learn: Machine learning algorithms
  • Matplotlib: Data visualization
  • Seaborn: Statistical visualizations

License ๐Ÿ“„

This project is licensed under the MIT License - see the LICENSE file for details.


Author ๐Ÿ‘จโ€๐Ÿ’ป

Ritik Pratap Singh Patel


Acknowledgments ๐Ÿ™

This project was developed as an assignment for Zeotap's Data Scientist position. It demonstrates practical application of data science techniques for customer analytics and segmentation.

Zeotap: https://github.com/zeotap


Quick Start ๐Ÿš€

# Clone repository
git clone https://github.com/patelritiq/DS_Assignment_Zeotap.git
cd DS_Assignment_Zeotap

# Install dependencies
pip install pandas numpy matplotlib seaborn scikit-learn

# Run EDA
cd src
python eda.py

# Run Clustering
python clustering.py

# Run Lookalike Modeling
python lookalike.py

Transform customer data into actionable insights! ๐Ÿ“Šโœจ

About

Customer analytics and segmentation project using K-Means clustering, EDA, and lookalike modeling. Assignment for Zeotap Data Scientist position. Analyzes 200 customers and 1,000 transactions with 5-cluster segmentation (DB Index: 1.05).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages