Skip to content

A modular Python library for time series forecasting with SARIMAX, symbolic entropy rate estimation via Lempel-Ziv methods, and geospatial analysis using the haversine distance. Includes support for endogenous weekly patterns, exogenous drivers, and flexible data preprocessing.

Notifications You must be signed in to change notification settings

CoMuNeLab/forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Forecasting

A modular Python package for time series forecasting, entropy-rate estimation, and geospatial distance computation.

This project includes:

  • Lempel-Ziv-based entropy rate estimators (for strings or symbolic sequences)
  • SARIMAX forecasting with optional endogenous (weekly patterns) and exogenous drivers
  • Haversine formula for computing geodesic distances

Installation

pip install -e .

This installs the forecasting package from the src/ directory in editable mode (requires Python ≥ 3.8).


Package Structure

src/forecasting/
├── entropy.py       # Entropy rate estimators
├── forecasting.py   # SARIMAX-based time series forecasting
├── geo.py           # Haversine distance utility
├── substring.py     # Substring/pattern match utilities
├── __init__.py

Features

1. Entropy Rate Estimation

from forecasting.entropy import get_entropy_rate_str, get_entropy_rate_fast, get_entropy_rate_lz

seq = 'abcabcabcabc'
rate = get_entropy_rate_str(seq)

sym_seq = ['1', '3', '5', '5', '0', '10', '27']
rate_fast = get_entropy_rate_fast(sym_seq)
rate_lz = get_entropy_rate_lz(sym_seq)

2. SARIMAX Forecasting Pipeline

Forecasts a time series using SARIMAX, with optional drivers.

from forecasting.forecasting import run_sarimax_pipeline

results = run_sarimax_pipeline(
    file_name="data1.csv",
    dt="00:10:00",
    dt_string="10min",
    int_pred="02:00:00",
    int_pred_string="2h",
    endo_drivers="Weekly",       # or "No"
    ex_drivers="data_weather.csv"  # or "No"
)

The input CSV (file_name) must contain:

value,date,time
39.976242,2007-11-30,14:34:51
39.976243,2007-11-30,14:34:52
...

Optional Exogenous Drivers: data_weather.csv must follow the same format.

The model:

  • Bins and interpolates time series data
  • Splits into train/test based on int_pred
  • Optionally models weekly patterns and external influences
  • Searches best SARIMAX params (AIC-minimization)
  • Saves prediction to CSV and PNG plot (if plot=True)

3. Geospatial Distance

from forecasting.geo import haversine

dist = haversine(lon1=12.49, lat1=41.89, lon2=2.29, lat2=48.85)  # meters

License

This project is licensed under the MIT License.


Author

Developed by Valeria D'Andrea Refactored and modularized for packaging and reuse.

About

A modular Python library for time series forecasting with SARIMAX, symbolic entropy rate estimation via Lempel-Ziv methods, and geospatial analysis using the haversine distance. Includes support for endogenous weekly patterns, exogenous drivers, and flexible data preprocessing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages