This is the official PyTorch implementation of the paper FlexTSF: A Flexible Forecasting Model for Time Series with Variable Regularities.
This repository provides the source code implementation for:
Recently, there has been significant research on universal time series forecasting models, which can be directly applied to various domains after pre-training. However, alongside broader applications, a key challenge arises: temporal irregularity—the existence of missing values, uneven time intervals, and variable sequence lengths. We illustrate temporal irregularity in the following figure. (a) shows regularly sampled data; (b) depicts missing data due to events like holidays; (c) presents blood pressure measurements becoming denser as a patient's condition worsens; (d) shows irregular satellite observations influenced by atmospheric conditions such as clouds and fog.
We propose FlexTSF, the first universal forecasting model built from the perspective of breaking data regularity constraints. FlexTSF not only performs well on data with temporal irregularity but is also broadly applicable across domains with various temporal granularities. As shown in the following figure, FlexTSF employs a decoder-only architecture, where time series input data is organized into patches. Previously observed patches attend to the generation of future patches, which are then transformed into forecasts. Built on this backbone, FlexTSF introduces a novel patching module and a domain self-adaptation mechanism.
Specifically,
- We introduce IVP Patcher, a continuous-time patching module, to handle irregular time series, overcoming limitations of traditional fixed-size patching methods.
- We propose a timestamp normalization scheme and a forefront computing node, enabling domain-aware adaptation and improving cross-domain generalization.
Our evaluation on 16 benchmark datasets demonstrates that FlexTSF achieves the lowest MSE in 22 of 24 irregular-forecasting tasks (8 datasets × 3 horizons), exhibits better robustness across varying missing rates, and significantly outperforms state-of-the-art baselines in the zero-shot setting. Ablations further confirm the contribution of each component and the benefits of using random patch lengths.
FlexTSF has been tested with Python 3.10 using the Conda environment management tool.
To ensure consistent library versions, you can install the required dependencies for this project by running the following command:
conda env create -f environment.yml
As some libraries are frequently updated, we have included two specific versions of the dependencies (torchdiffeq and stribor) in the "libs" folder to ensure successful execution of the code.
We conduct three stages of experiments using two non-overlapping groups of datasets: the pre-training datasets $\mathcal{D}{p}$ and the held-out datasets $\mathcal{D}{h}$.
-
Obtain: Our pre-training dataset group
$\mathcal{D}_{p}$ consists of datasets from the Monash Time Series Forecasting Archive and the UCR & UEA Time Series Classification Archive. -
Preprocess: The preprocessing programs can be found in folder "preprocess/pre_monash_tsc".
After processing,
-
Obtain: They are from the Long Time Series Forecasting Benchmark.
-
Preprocess: The illness dataset was extended (preprocess/pre_ltf) so that it supports the same input/output data length as the other datasets. No preprocessing is required for the others. The data can be read directly by the function in file "experiments/data_ltf.py".
-
Obtain: This dataset can be downloaded from https://zenodo.org/records/5712933.
-
Preprocess: The preprocessing programs can be found in folder "preprocess/pre_satsm".
-
Obtain: This dataset can be downloaded from https://github.com/liyaguang/DCRNN.
-
Preprocess: The preprocessing programs can be found in folder "preprocess/pre_traffic".
-
Obtain: They are from the UEA Time Series Classification Archive and have been removed from the pre-training set to ensure no overlap between the pre-training set and the held-out set.
-
Preprocess: No preprocessing is required, the data can be read directly by the function in file "experiments/data_ucruea.py".
-
Obtain: We used the dataset Localization Data for Person Activity.
-
Preprocess: The data downloading and reading programs can be found in file "experiments/data_harw4imu.py".
-
Obtain: We used eICU v2.0, which can be downloaded from https://physionet.org/content/eicu-crd/2.0/.
-
Preprocess: The preprocessing programs can be found in folder "preprocess/pre_eicu". They were developed upon a previous work.
-
Obtain: We used PhysioNet 2012 v1.0, which can be downloaded from https://physionet.org/content/challenge-2012/1.0.0/.
-
Preprocess: The automatic downloading and preprocessing code is in file "experiments/data_physionet12.py". This file was built upon a previous program.
-
Obtain: We used MIMIC-IV v1.0, which can be downloaded from https://physionet.org/content/mimiciv/1.0/.
-
Preprocess: The preprocessing programs can be found in folder "preprocess/pre_mimic4".
In the first stage, we perform classic training-validation-testing experiments to demonstrate the effectiveness of FlexTSF. Next, we pre-train FlexTSF, resulting in a model with 63 million parameters. This model is initially used for zero-shot forecasting to evaluate its potential as a universal model, and is then fine-tuned for time series forecasting to assess its adaptability to new domains in few-shot scenarios.
Each dataset in
For regular datasets, we use a fixed input length of 96 and a forecasting horizon of 96. For irregular datasets, it is impractical to define fixed input and output lengths across all cases. Instead, we adopt a forecast-to-input ratio and vary it across {0.25, 0.5, 1.0} to evaluate performance under different forecasting horizons.
You can use the following commands to run the program. VS Code users can also check out the file .vscode/launch.json, which may be more convenient for trying out the programs.
Run FlexTSF on a specific dataset:
python main.py --base_model flextsf --ml_task forecast --value_norm --time_norm --patch_seg random --data_name eICU
Run FlexTSF on all irregular datasets:
python main.py --base_model flextsf --ml_task forecast --value_norm --time_norm --patch_seg random --data_group irregular
Run FlexTSF on all regular datasets with 20% missing rate:
python main.py --base_model flextsf --ml_task forecast --value_norm --time_norm --patch_seg random --data_group regular --ddr 0.2
Pre-train FlexTSF:
python main.py --base_model flextsf --data_name monash --attn_layers 6 --nhead 12 --dim_attn_internal 768 --dim_patch_ts 768 --dim_ivp_hidden 768 --ml_task uni_pretrain --value_norm --time_norm --weight_decay 0.1 --epochs_max 20 --dev_mode run
Deploy pre-trained FlexTSF in zero-shot settings:
python main.py --base_model flextsf --ml_task forecast --value_norm --time_norm --train_setting zero --attn_layers 6 --nhead 12 --dim_attn_internal 768 --dim_patch_ts 768 --dim_ivp_hidden 768 --pre_random_seed 1 --zeroshot_epoch 5 --fore_len 0.25
--zeroshot_epoch 5
: We use the model that has been trained for 6 epochs.
Deploy pre-trained FlexTSF in the few-shot setting with 50 fine-tuning samples:
python main.py --base_model flextsf --model_type reconstruct --ml_task forecast --value_norm --time_norm --train_setting few --pre_model {patch_ckpt} --attn_layers 6 --nhead 12 --dim_attn_internal 768 --dim_patch_ts 768 --dim_ivp_hidden 768 --pre_random_seed 1 --few_shot_config 50 --fore_len 0.25
{patch_ckpt}
specifies the path of the checkpoint. We used the model that had been trained for 20 epochs.
Pre-train the model:
python main.py --base_model flextsf --data_name monash --attn_layers 6 --nhead 12 --dim_attn_internal 768 --dim_patch_ts 768 --dim_ivp_hidden 768 --ml_task uni_pretrain --value_norm --time_norm --patch_module none --patch_len_pretrain 1 --batch_size 16 --weight_decay 0.1 --epochs_max 20
Run zero-shot experiments:
python main.py --base_model flextsf --ml_task forecast --value_norm --time_norm --train_setting zero --attn_layers 6 --nhead 12 --dim_attn_internal 768 --dim_patch_ts 768 --dim_ivp_hidden 768 --patch_module none --patch_seg given --patch_len 1 --pre_random_seed 1 --zeroshot_epoch 5 --fore_len 0.25
Pre-train the model:
python main.py --base_model flextsf --data_name monash --attn_layers 6 --nhead 12 --dim_attn_internal 768 --dim_patch_ts 768 --dim_ivp_hidden 768 --ml_task uni_pretrain --value_norm --weight_decay 0.1 --epochs_max 20
Run zero-shot experiments:
python main.py --base_model flextsf --ml_task forecast --value_norm --train_setting zero --attn_layers 6 --nhead 12 --dim_attn_internal 768 --dim_patch_ts 768 --dim_ivp_hidden 768 --pre_random_seed 1 --zeroshot_epoch 5 --fore_len 0.25
Pre-train the model:
python main.py --base_model flextsf --data_name monash --attn_layers 6 --nhead 12 --dim_attn_internal 768 --dim_patch_ts 768 --dim_ivp_hidden 768 --ml_task uni_pretrain --value_norm --time_norm --leader_node --weight_decay 0.1 --epochs_max 20
Run zero-shot experiments:
python main.py --base_model flextsf --ml_task forecast --value_norm --time_norm --train_setting zero --attn_layers 6 --nhead 12 --dim_attn_internal 768 --dim_patch_ts 768 --dim_ivp_hidden 768 --pre_random_seed 1 --leader_node --zeroshot_epoch 5 --fore_len 0.25