TimeFlight — Regional Flight Delay Forecasting

TL;DR

TimeFlight is a regional flight-delay forecasting system built on 21 years of U.S. aviation data.
Using historical delay minutes from the Bureau of Transportation Statistics, the project shows that:

Flight delays are not uniform across the U.S. and must be modeled region-by-region
Seasonality is the dominant signal driving delay risk nationwide
The South East and West experience the highest and most volatile delay pressure
Classical time-series models, when applied correctly, can reliably flag high-risk months in advance
Seasonal Auto-ARIMA consistently produces the most stable and operationally useful forecasts

The output is not a perfect point forecast, but a directional risk signal that supports staffing, scheduling, and contingency planning

Problem Context

Flight delays impose tens of billions of dollars in annual losses across airlines, airports, travelers, and the wider economy.
Despite this, many operational decisions are still based on heuristics like “summer is bad” rather than quantified, forward-looking risk.

This project asks a focused question:
Can we use historical delay behavior alone to generate early, region-specific signals of future delay pressure that are useful for real planning decisions?

Key Results

The analysis shows clear and persistent regional structure in delay behavior.

The South East and West consistently exhibit higher average delays and larger volatility
The North East and South West are comparatively stable and serve as useful baselines
All regions show strong annual seasonality, with predictable summer peaks
Structural shocks, especially COVID-19, materially shift baseline behavior and must be handled explicitly

Across regions and historical cutoffs, Seasonal Auto-ARIMA best balances accuracy, stability, and interpretability, particularly in high-volatility regions.

Dataset

TimeFlight uses the Airline On-Time Performance / Cause of Delay dataset from the
U.S. Bureau of Transportation Statistics (BTS).

Time period: 2004–2024
Granularity: Monthly, airport-level observations
Size: ~400,000 rows across all U.S. airports

The raw data is aggregated into five U.S. regions (North East, Mid West, South East, South West, West) to improve signal-to-noise and support regional planning.

Dataset source:
https://www.transtats.bts.gov/

Approach (High Level)

The project follows a deliberately lightweight and transparent pipeline:

Ingest and clean monthly delay data from BTS
Map airports into five geographic regions
Aggregate airport-level delays into region-level time series
Perform exploratory analysis to understand trend, volatility, and seasonality
Forecast each region independently using classical univariate models:
- Double ETS
- Triple ETS
- ARIMA(1,0,1)
- Auto-ARIMA (seasonal and non-seasonal)
Evaluate models across multiple historical cutoffs to test robustness

The focus is on stable, interpretable signals, not overfitting.

Repository Contents

TimeFlight Final Code
End-to-end analysis: data preparation, EDA, modeling, and evaluation.
TimeFlight Final Report
Detailed write-up of assumptions, model comparisons, and findings.
TimeFlight Presentation
Executive-level framing focused on operational and policy impact.

Each component reinforces a different layer of understanding: code, analysis, and decision context.

How to Use

This repository is designed for exploration and review.

Start with the presentation to understand the problem and outcomes
Read the report for deeper reasoning and model comparison
Review the code to see how the pipeline is implemented end-to-end
Extend the analysis by adding exogenous variables or airport-level modeling

Why This Matters

TimeFlight demonstrates that simple, well-designed models can deliver meaningful operational value when paired with strong data engineering and evaluation discipline.

Rather than chasing complexity, the project shows how classical forecasting methods can still play a central role in real-world decision support systems.

Next Steps

Planned extensions include:

Incorporating weather, traffic volume, and scheduling data
Moving from regional to airport-level forecasting
Supporting additional delay metrics beyond total delay minutes
Packaging forecasts into dashboards or APIs for operational use

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
TimeFlight Code.ipynb		TimeFlight Code.ipynb
TimeFlight Final Report.pdf		TimeFlight Final Report.pdf
TimeFlight Presentation.pdf		TimeFlight Presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TimeFlight — Regional Flight Delay Forecasting

TL;DR

Problem Context

Key Results

Dataset

Approach (High Level)

Repository Contents

How to Use

Why This Matters

Next Steps

About

Uh oh!

Releases

Packages

Languages

adarsh-rai-secure/flight-delay-forecasting

Folders and files

Latest commit

History

Repository files navigation

TimeFlight — Regional Flight Delay Forecasting

TL;DR

Problem Context

Key Results

Dataset

Approach (High Level)

Repository Contents

How to Use

Why This Matters

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages