Dynamic Pricing Optimization: Predicting Ride Fares with Machine Learning

Introduction

Dynamic pricing has become an essential part of many industries, especially for ride-hailing platforms such as Uber, Lyft, and similar services. These platforms rely on pricing algorithms to adjust ride costs based on a combination of factors such as supply-demand imbalances, location, trip duration, customer loyalty, and more. This project aims to develop a machine learning model to predict ride costs dynamically, providing fair and accurate pricing that benefits both customers and service providers.

The solution involves building a robust machine learning pipeline that includes data preprocessing, exploratory data analysis, feature engineering, model training, evaluation, and deployment as a REST API. This API enables real-time predictions of ride costs for practical use cases.

Problem Description

The primary challenge in ride-hailing services is to dynamically and accurately determine the cost of a ride based on various factors while ensuring:

Customer Satisfaction: Pricing should reflect fairness and transparency to retain user trust.
Profitability for Providers: The model should balance supply and demand effectively to maximize revenue while maintaining affordability for customers.
Scalability: The solution must handle large-scale data inputs for real-time predictions.

In this project, we predict the cost of rides using features like:

Number of available drivers and riders.
Ratings and past ride history.
Supply-demand metrics and location categories.
Trip-specific factors such as ride duration and vehicle type.

This predictive model is built with the intention of:

Supporting pricing teams in optimizing ride costs.
Enhancing transparency by providing an explainable model.
Offering real-time predictions for dynamic pricing strategies.

Why This Project?

Dynamic pricing is a critical operational component in the ride-hailing industry, and it serves multiple purposes:

Demand-Supply Management: Ensuring sufficient driver availability by incentivizing driver participation during high-demand periods.
Customer Retention: Offering affordable pricing options to customers in less competitive locations or periods.
Profit Optimization: Calculating optimal ride costs to maximize profitability while maintaining high utilization rates.

This prediction model can be implemented in:

Operational Dashboards: Enabling service providers to view real-time predictions and adjust strategies.
Customer Applications: Offering upfront cost estimates that align with dynamic pricing algorithms.

By leveraging machine learning, we aim to create a scalable, efficient, and accurate model for dynamic pricing.

Exploratory Data Analysis (EDA)

A critical step in the project was to explore the dataset to uncover insights, handle missing values, and prepare the data for modeling. Key analyses include:

Understanding Feature Distributions:
- Distribution of numerical features such as number_of_riders, average_ratings, and expected_ride_duration.
- Relationships between features and the target variable (ride_cost).
- Histograms, box plots, and density plots.
Target Variable Analysis:
- Investigated the distribution of ride costs, looking for outliers or skewness.
Feature Correlations:
- Used heatmaps to identify correlated features to reduce multicollinearity.
- Determined important features using feature importance scores from tree-based models.
Handling Missing Values:
- Checked for missing data in each feature.
- Imputed missing values based on statistical methods or domain knowledge.
Outlier Detection:
- Identified and handled outliers in numerical features to avoid biased predictions.

Dataset

The dataset contains the following features:

Feature Name	Description
`Number_of_Riders`	Total number of ride requests.
`Number_of_Drivers`	Total number of available drivers.
`Location_Category`	Type of location (e.g., Urban, Suburban, Rural).
`Customer_Loyalty_Status`	Customer loyalty level (e.g., Silver, Regular).
`Number_of_Past_Rides`	Total number of rides completed by the customer.
`Average_Ratings`	Customer’s average ride ratings.
`Time_of_Booking`	Time of day when the ride was booked (e.g., Night, Evening, Afternoon).
`Vehicle_Type`	Type of vehicle used (e.g., Premium, Economy).
`Expected_Ride_Duration`	Predicted duration of the ride in minutes.
`Historical_Cost_of_Ride`	Actual cost of the ride (target variable for prediction).

Tools and Frameworks

Tools:

Python: Data preprocessing, model training, and API development.
Jupyter Notebooks: For exploratory data analysis (EDA) and feature engineering.

Libraries:

Scikit-learn: Model development and evaluation.
Flask: Deployment of the prediction API.
Pandas/NumPy: Data manipulation.
Matplotlib/Seaborn: Visualization.

Other Tools:

Git LFS: For tracking large files in the repository.
Docker (Optional): For containerizing the API.

Directory Structure

BikeML-API/
│
├── data/
│   ├── dynamic_pricing.csv             
│   └── ...
│
├── imgs/                                       # (Optional) Visualization images
│   └── ...
│
├── models/
│   ├── gradient_boosting_model.pkl             # Trained model for predictions
│   └── ...
│
├── notebooks/
│   ├── DPML.ipynb                              # Data preprocessing and feature engineering
│   └── ...
│
├── app.py                                      # Flask API script
├── predict_request.ps1                         # Script for making POST requests to the API
├── requirements.txt                            # Python dependencies
└── README.md                                   # Project description and instructions

plaintext

Model Training and Selection

Training Process

We trained multiple models to ensure robust performance and selected the best-performing one:

Trained Models:
- Linear Regression: Provided a baseline model for comparison.
- Decision Trees: Used for capturing non-linear relationships.
- Random Forests: Improved performance through ensemble learning.
- Gradient Boosting Regressor: Achieved the best overall performance with hyperparameter tuning.
Hyperparameter Tuning:
- Utilized GridSearchCV and RandomizedSearchCV to optimize parameters such as learning rate, tree depth, and the number of estimators.
Evaluation Metrics:
- Mean Absolute Error (MAE).
- Root Mean Squared Error (RMSE).
- R² Score.

Final Model

The Gradient Boosting Regressor was chosen for its superior performance in capturing complex relationships between features and the target variable.

Model Deployment

The trained model was deployed as a REST API using Flask:

Endpoints:
- /: Returns a welcome message.
- /predict: Accepts JSON input and returns a predicted cost.
Input Example:

{
  "number_of_riders": 50,
  "number_of_drivers": 30,
  "number_of_past_rides": 15,
  "average_ratings": 4.5,
  "expected_ride_duration": 60,
  "location_category_Urban": 1,
  "location_category_Suburban": 0,
  "customer_loyalty_status_Regular": 0,
  "demand_supply_ratio": 1.0
}

Output Example:

{
  "predicted_cost": 652.77
}

Reproducibility

To ensure the project can be reproduced:

Data:
- Dataset is provided in the repository or includes instructions for downloading.
Scripts:
- Separate Python scripts are available for training and testing the model.
Instructions:
- A step-by-step guide is included to run the project.

Dependency and Environment Management

Dependencies:
- Listed in requirements.txt for easy installation.
- Key dependencies: Flask, scikit-learn, pandas, numpy.

Virtual Environment:

Instructions to set up the environment:

python -m venv env
source env/bin/activate  # Linux/Mac
env\Scripts\activate  # Windows
pip install -r requirements.txt

How to Run

Local Setup

Clone the repository:
```
git clone <repo-url>
```
Navigate to the project directory and set up dependencies.
Start the Flask API:
```
python app.py
```

Docker Setup

Build the image:
```
docker build -t dynamic-pricing-api .
```

Run the container:

docker run -p 5000:5000 dynamic-pricing-api

Testing the API

Use tools like Postman or a Python script to send POST requests to the /predict endpoint.

Acknowledgments

This project was developed as part of the ML Zoomcamp course. Special thanks to the course team for providing a structured learning environment and clear evaluation criteria.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Pricing Optimization: Predicting Ride Fares with Machine Learning

Table of Contents

Introduction

Problem Description

Why This Project?

Exploratory Data Analysis (EDA)

Dataset

Tools and Frameworks

Tools:

Libraries:

Other Tools:

Directory Structure

Model Training and Selection

Training Process

Final Model

Model Deployment

Reproducibility

Dependency and Environment Management

How to Run

Local Setup

Docker Setup

Testing the API

Acknowledgments

About

Uh oh!

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
data		data
imgs		imgs
models		models
notebooks		notebooks
README.md		README.md
app.py		app.py
predict_request.ps1		predict_request.ps1
requirements.txt		requirements.txt

AFARNOOD/Dynamic-Pricing-ML

Folders and files

Latest commit

History

Repository files navigation

Dynamic Pricing Optimization: Predicting Ride Fares with Machine Learning

Table of Contents

Introduction

Problem Description

Why This Project?

Exploratory Data Analysis (EDA)

Dataset

Tools and Frameworks

Tools:

Libraries:

Other Tools:

Directory Structure

Model Training and Selection

Training Process

Final Model

Model Deployment

Reproducibility

Dependency and Environment Management

How to Run

Local Setup

Docker Setup

Testing the API

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Languages

Packages