Skip to content

This project builds a machine learning model to predict Flipkart Customer Satisfaction (CSAT) before customers submit feedback. Using interaction data such as response time and agent details, the XGBoost model achieved 70 percent test accuracy and enables proactive customer support decisions.

Notifications You must be signed in to change notification settings

vimalsolank1/project_ML_Classification_Flipkart_Customer_Service_Satisfaction

Repository files navigation

Predicting Customer Satisfaction Score Using Classification

download

Predicting Flipkart Customer Satisfaction (CSAT) Score Using Machine Learning

Business Problem Overview

In today’s fast-paced e-commerce environment, customer satisfaction is critical for business growth. Companies measure service performance using CSAT (Customer Satisfaction Score).

The challenge is that CSAT is collected after the customer interaction is completed. If the score is low, it is already too late to fix the experience.

This project aims to predict customer satisfaction before the feedback is given, enabling proactive support improvements.


About the Company

Image

Flipkart is one of India’s largest e-commerce platforms, handling millions of customer interactions every month through call, chat, and email channels.

Understanding the drivers of customer satisfaction helps improve service quality, reduce churn, and increase long-term profitability.


Project Objective

The goal of this project is to build a Machine Learning classification model that predicts CSAT score (1 to 5) using customer interaction details such as:

  • Communication Channel
  • Query Category and Sub-Category
  • Product Category
  • Agent Information
  • Response Time
  • Customer Remarks

Dataset Information

  • Total Records: 85,907
  • Total Features: 20
  • Time Period: 2022 to 2023
  • Target Variable: CSAT (1 to 5)

Important Columns

channel_name, customer_query_category, customer_query_sub_category, Customer_Remarks, Product_category, Item_price, Agent_name, Supervisor, Manager, Agent Shift, Tenure Bucket, CSAT

Missing Values

Some columns contained high missing values:

  • connected_handling_time (99.72 percent) and was dropped
  • Customer_City approximately 80 percent
  • Product_category approximately 80 percent
  • Item_price approximately 80 percent
  • Customer_Remarks approximately 66 percent

No duplicate rows were found in the dataset.


Project Workflow

1. Data Understanding and Exploration

  • Checked data structure and column types
  • Identified missing values
  • Verified duplicates
  • Converted date columns to datetime format

2. Data Wrangling, EDA and Hypothesis Testing

Data Cleaning

  • Dropped columns with extremely high null values
  • Used mean, mode, and constant imputation
  • Created a new feature response_time_minutes

Exploratory Data Analysis

Key insights:

  • Faster response time was associated with higher CSAT
  • App and Website related categories showed higher satisfaction
  • Gift Cards and Furniture categories showed lower satisfaction
  • Experienced agents and balanced workload improved CSAT

Hypothesis Testing

  • One Sample T-Test
  • ANOVA Test

These tests validated whether CSAT significantly differed across communication channels and price segments.


Feature Engineering and Preprocessing

  • Created features such as:

    • is_long_response
    • avg_csat_by_agent
    • agent_ticket_count
    • product_popularity
  • Handled outliers using IQR and percentile capping

  • Applied One-Hot Encoding, Label Encoding, and Ordinal Encoding

  • Selected important features using Random Forest Importance and SelectKBest

  • Applied MinMax Scaling

  • Used SMOTE to handle class imbalance


Model Implementation

Tested multiple algorithms:

  • Random Forest
  • CatBoost
  • XGBoost

Final Model: XGBoost

XGBoost was selected because it handled class imbalance effectively and provided the best balance between precision and recall.

Final Performance

  • Training Accuracy: 76 percent
  • Testing Accuracy: 70 percent
  • Weighted F1 Score: 0.63

The model showed good generalization and did not overfit.


Feature Importance Insights

The most important drivers of CSAT prediction were:

  • Response Time
  • Customer Sentiment
  • Product Popularity
  • Agent Workload
  • Agent Past Performance

Customers with longer response times and negative sentiment were more likely to give lower satisfaction scores.


Real-World Testing

The saved XGBoost model was tested on unseen data and performed consistently.

However, prediction performance for CSAT classes 2 and 3 was lower due to class imbalance. Future improvements can focus on collecting more balanced data to enhance performance.


Business Impact

This model enables Flipkart to:

  • Identify potentially dissatisfied customers before feedback submission
  • Speed up responses for high-risk cases
  • Assign better agents to critical tickets
  • Reduce churn
  • Improve customer retention
  • Increase overall profitability

Tech Stack

  • Python
  • Pandas and NumPy
  • Matplotlib and Seaborn
  • Scikit-Learn
  • XGBoost
  • CatBoost
  • SHAP
  • SMOTE

Conclusion

This project demonstrates how Machine Learning can transform customer support from reactive to proactive.

By predicting dissatisfaction early, businesses can take preventive actions and significantly improve overall customer experience.

About

This project builds a machine learning model to predict Flipkart Customer Satisfaction (CSAT) before customers submit feedback. Using interaction data such as response time and agent details, the XGBoost model achieved 70 percent test accuracy and enables proactive customer support decisions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published