A modular Python package for time series forecasting, entropy-rate estimation, and geospatial distance computation.
This project includes:
- Lempel-Ziv-based entropy rate estimators (for strings or symbolic sequences)
- SARIMAX forecasting with optional endogenous (weekly patterns) and exogenous drivers
- Haversine formula for computing geodesic distances
pip install -e .This installs the forecasting package from the src/ directory in editable mode (requires Python ≥ 3.8).
src/forecasting/
├── entropy.py # Entropy rate estimators
├── forecasting.py # SARIMAX-based time series forecasting
├── geo.py # Haversine distance utility
├── substring.py # Substring/pattern match utilities
├── __init__.py
from forecasting.entropy import get_entropy_rate_str, get_entropy_rate_fast, get_entropy_rate_lz
seq = 'abcabcabcabc'
rate = get_entropy_rate_str(seq)
sym_seq = ['1', '3', '5', '5', '0', '10', '27']
rate_fast = get_entropy_rate_fast(sym_seq)
rate_lz = get_entropy_rate_lz(sym_seq)Forecasts a time series using SARIMAX, with optional drivers.
from forecasting.forecasting import run_sarimax_pipeline
results = run_sarimax_pipeline(
file_name="data1.csv",
dt="00:10:00",
dt_string="10min",
int_pred="02:00:00",
int_pred_string="2h",
endo_drivers="Weekly", # or "No"
ex_drivers="data_weather.csv" # or "No"
)The input CSV (file_name) must contain:
value,date,time
39.976242,2007-11-30,14:34:51
39.976243,2007-11-30,14:34:52
...
Optional Exogenous Drivers: data_weather.csv must follow the same format.
The model:
- Bins and interpolates time series data
- Splits into train/test based on
int_pred - Optionally models weekly patterns and external influences
- Searches best SARIMAX params (AIC-minimization)
- Saves prediction to CSV and PNG plot (if
plot=True)
from forecasting.geo import haversine
dist = haversine(lon1=12.49, lat1=41.89, lon2=2.29, lat2=48.85) # metersThis project is licensed under the MIT License.
Developed by Valeria D'Andrea Refactored and modularized for packaging and reuse.