Forecasting Pothole Development in Syracuse, NY

This repository contains the final project for IST 707 Applied Machine Learning course, Syracuse University. Our project aims to forecast pothole development in Syracuse, NY, based on multiple factors such as weather, pavement ratings, and reported maintenance requests.

Team Members

Marina Mitiaeva, mmitiaev@syr.edu

Cathryn Lee Shelton, clshelto@syr.edu

Abhi Chakraborty, abchakra@syr.edu

Edward Joseph Cogan II, ejcogani@syr.edu

Problem Statement

Pothole development occurs due to:

Poor paving materials.
Extreme temperature changes.
Traffic load over time.

This leads to road deterioration, impacting road safety and maintenance costs.

Goal

To build a predictive model that:

Forecasts the count of potholes.
Helps city maintenance departments to efficiently plan maintenance operations.
Reduces costs and improves road safety.

Data Sources

We utilized several public datasets relevant to road conditions, weather, and maintenance requests:

1. Pavement Ratings Data

Data provided by the Syracuse Metropolitan Transportation Council across multiple years:

2. Weather Data

Weather data from NASA, capturing climate conditions such as temperature fluctuations and precipitation, which impact road quality.

3. SYRCityline Requests Data

Collected by Syracuse citizens via SeeClickFix, this dataset tracks public maintenance requests, including pothole reports.

SYRCityline Requests

4. Streets Database

Detailed street information from Syracuse’s open data portal. This dataset provides road classifications and usage patterns that help predict pothole-prone areas.

Syracuse Streets Data

Data collected from 2021-2023.

Project Overview

EDA (Exploratory Data Analysis):
- Correlation matrices and visual insights.
Model:
- Linear Regression (Baseline)
- Loss function: Mean Squared Error (MSE)
Data Preprocessing:
- Numerical Features: Imputed with the mean and scaled.
- Categorical Features: Imputed with the most frequent value, one-hot encoded.
Data Split:
- Train: 60%
- Validation: 20%
- Test: 20%

Repository Structure

/data/: Contains raw and processed datasets.
/notebooks/: Jupyter notebook with code and experiments.
/models/: Saved model weights and checkpoints.
/predictions/: Generated predictions and model outputs for analysis and evaluation.
/presentation/: Final project presentation in PPT format.

Results

Our analysis compared the Mean Squared Error (MSE) across multiple models, incorporating various feature engineering and transformation techniques. The key takeaways from the results are:

Baseline models (linear regression on all features) achieved an MSE of 0.1451, while feature engineering improved performance slightly (0.1299).
Polynomial transformations combined with different regression techniques (ridge, lasso, elastic net, random forest, gradient boosting, stacking, and voting) led to varying performance improvements, with the best performing models around 0.1189 - 0.1251 MSE.
Stacking showed higher MSE (0.1894), indicating potential overfitting or poor generalization.
The best model on validation data achieved an MSE of 0.1189, and the final model tested on unseen data reached an MSE of 0.0339, demonstrating strong predictive performance.

Installation

To set up the environment, use:

pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Forecasting Pothole Development in Syracuse, NY

Team Members

Problem Statement

Goal

Data Sources

1. Pavement Ratings Data

2. Weather Data

3. SYRCityline Requests Data

4. Streets Database

Project Overview

Repository Structure

Results

Installation

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
models		models
notebooks		notebooks
predictions		predictions
presentation		presentation
README.md		README.md
requirements.txt		requirements.txt

marinamitiaeva/IST707.AppliedML

Folders and files

Latest commit

History

Repository files navigation

Forecasting Pothole Development in Syracuse, NY

Team Members

Problem Statement

Goal

Data Sources

1. Pavement Ratings Data

2. Weather Data

3. SYRCityline Requests Data

4. Streets Database

Project Overview

Repository Structure

Results

Installation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages