Lyft Price Prediction

Author: Ajay Varma
Date: APR-2025
Project Directory: /Users/ajayvarma/Documents/VS Code/Workspace/Data Science/Projects/Lyft/
Type: Portfolio Project

Project Overview

Objective

The objective of this project is to build a machine learning model that predicts the price of Lyft rides. The model takes into account various features such as ride distance, passenger count, time of day, and ride type. By providing accurate pricing predictions, this model helps improve pricing transparency, customer satisfaction, and operational efficiency for Lyft.

Lyft VSCode Setup

Dataset

The dataset consists of historical Lyft ride data and includes the following key features:

distance: Ride distance in miles.
passenger_count: Number of passengers in the ride.
time_of_day: Time of day when the ride was requested.
price: Ride price (target variable).
Categorical Features: Ride type (name_Lux Black XL, name_Shared), weather conditions, and source/destination locations (source_North End, destination_Back Bay).

Methodology

The project follows a structured approach to machine learning model development:

Data Preprocessing:
- Cleaned the data by removing duplicates, handling missing values, and encoding categorical variables using one-hot encoding.
Outlier Treatment:
- Applied the Interquartile Range (IQR) method to detect and remove outliers from the price and distance columns, ensuring better model accuracy.
Feature Engineering:
- Selected and directly used features such as distance, passenger_count, and time_of_day for model training. Categorical features like ride types and weather conditions were encoded for use in machine learning models.
Model Development:
- Built and compared multiple regression models: Linear Regression and Random Forest Regressor. The Random Forest model outperformed Linear Regression due to its ability to capture non-linear relationships and complex feature interactions.
Model Evaluation:
- Evaluated models using Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R²) to measure accuracy and predictive power.
Feature Importance:
- Conducted feature importance analysis using the Random Forest model to identify the most influential features. The ride type (name_Lux Black XL) and distance were found to be the most important features for predicting price.
Model Saving:
- The best-performing model, Random Forest, was saved using joblib for future use and deployment.

Model Performance Comparison

Model	MAE (Mean Absolute Error)	MSE (Mean Squared Error)	R-squared
Linear Regression	1.63	4.52	0.95
Random Forest	0.95	2.07	0.98

Insights & Recommendations

Key Insights

Random Forest Outperforms Linear Regression:
- Lower MAE (0.95 vs. 1.63): Random Forest delivers predictions that are much closer to actual values.
- Lower MSE (2.07 vs. 4.52): The Random Forest model exhibits smaller squared errors on average, improving the prediction quality.
- Higher R² (0.98 vs. 0.95): Random Forest explains 98% of the variance in the data, providing a more accurate prediction of the ride prices.
Feature Importance:
- The most important features for price prediction were:
  - Ride Type (name_Lux Black XL)
  - Distance
- Weather conditions and source/destination features were less influential in the model’s predictions.

Recommendations

Model Deployment:
- The Random Forest model is recommended for deployment due to its superior performance. It can be used for real-time ride price prediction in Lyft's production environment.
Model Tuning:
- Hyperparameter tuning: Consider adjusting parameters such as the number of trees and tree depth in the Random Forest to improve model performance.
- Cross-validation: Implement cross-validation techniques to ensure the model generalizes well across different data subsets.
Exploring Advanced Models:
- Experiment with advanced models like Gradient Boosting Machines (GBM), XGBoost, or LightGBM to explore further performance improvements.
Feature Engineering:
- Consider adding new features or transforming existing ones to further enhance model accuracy.
- Feature selection could be useful to remove irrelevant or highly correlated features, improving the model's efficiency.
Outlier Handling:
- Although IQR outlier removal was effective, additional domain knowledge or custom outlier detection techniques could be applied to refine the model.
Future Work:
- Model Monitoring and Maintenance: Monitor the model's performance over time and retrain it as new data becomes available to maintain prediction accuracy.
- Scalability: Ensure the model is scalable to handle larger datasets as Lyft expands.

Conclusion

The Random Forest model is the recommended choice for predicting Lyft ride prices due to its exceptional performance across key metrics (MAE, MSE, R²).
While Linear Regression provides a simpler alternative, Random Forest outperforms it in capturing complex relationships in the data.
Hyperparameter tuning, exploring new models, and continuous monitoring are recommended for ongoing improvements in the model's predictive performance.

Project Structure

Notebooks/: Contains Jupyter notebooks used for exploratory data analysis, data preprocessing, and model training.
Models/: Contains the trained machine learning models, including the saved Random Forest model (random_forest_model.joblib).
Data/: Contains datasets and other related files, such as data used for training and testing.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Data		Data
Models		Models
Notebooks		Notebooks
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lyft Price Prediction

Project Overview

Objective

Lyft VSCode Setup

Dataset

Methodology

Model Performance Comparison

Insights & Recommendations

Key Insights

Recommendations

Conclusion

Project Structure

About

Releases

Packages

Languages

ajayvarmaco/Lyft

Folders and files

Latest commit

History

Repository files navigation

Lyft Price Prediction

Project Overview

Objective

Lyft VSCode Setup

Dataset

Methodology

Model Performance Comparison

Insights & Recommendations

Key Insights

Recommendations

Conclusion

Project Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages