Implementation of Frequency Pretraining (FPT) as described in:
Grieger N, Mehrkanoon S, Bialonski S. Data-Efficient Sleep Staging with Synthetic Time Series Pretraining. Algorithms. 2025; 18(9):580. https://doi.org/10.3390/a18090580
Part of our work was presented at the ICLR 2024 Workshop on Learning from Time Series for Health (the code for this
version of the paper can be found in the ts4h-ext-abstract-version branch):
Niklas Grieger, Siamak Mehrkanoon and Stephan Bialonski. Pretraining Sleep Staging Models without Patient Data. In ICLR 2024 Workshop on Learning from Time Series for Health, Vienna, Austria, 2024. URL https://openreview.net/forum?id=xOchS6sthY
The project is set up as follows:
base/: contains the python implementations of the FPT schemecache/dod_o_h/: contains the preprocessed data, as produced by theprepare_dodh.pyandprepare_dodo.pyscriptsconfig/: contains the configurations of the experiments, configuring how to train or evaluate the model; configurations are based on the hydra frameworkdata/: base configurations around dataloading and data splitsexp001/: experiments are groups of sub-experiments, e.g.exp001contains the four sub-experimentsexp001a,exp001b,exp001c, andexp001dexp001a.yaml,exp001b.yaml,exp001c.yaml,exp001d.yaml: configurations for the four sub-experimentsfpt_config.yaml: base configuration for theexp001experiment groupmanual.md: manual for theexp001experiment group, which describes how to run the experiments and evaluate the models; the manual is used to reproduce the results of the paper
exp002/, ...: similar toexp001, but for a different set of experimentslauncher/: base config around the launcher that is used by hydra to launch runssweeper/: base config around the sweeper that is used by hydra to sweep over parameter ranges in multirunsbase_config.yaml: base configuration for all experiments; describes the rough outline of experiment configurationsexperiments.md: describes the existing experiments and where to find the results of training and evaluation
logs/: contains the logs of the training and evaluation runs; the folder structure is described in theconfig/experiments.mdfilemodels/: contains the trained model checkpointspreprocessing/: contains scripts used to preprocess the datascripts/: contains training and evaluation scripts, which are used for model training and subsequent evaluation in experiments (as configured within theconfigdirectory); also contains scripts used to create the figures in our paper
On Linux and Windows the project can be used by running the following commands to clone the repository and install the required dependencies.
git clone https://github.com/dslaborg/frequency-pretraining.git
cd frequency-pretraining
conda create -n fpt python=3.10
conda activate fpt
# install dependencies
pip install -r requirements.txtgit clone https://github.com/dslaborg/frequency-pretraining.git
cd frequency-pretraining
# install dependencies
pip install -r requirements.txtTo reproduce the results described in the paper, you need to (i) download the data, (ii) preprocess the data, (iii) pretrain/fine-tune the models according to exp001a-exp001d, and (iv) evaluate the models:
- Download the data:
- DODO/H datasets:
- Download the EEG signals using the download_data.py script.
- Download the annotations from the dreem-learning-evaluation repository.
- Sleep-EDFx dataset:
- Download the data from the PhysioNet website.
- ISRUC dataset:
- Download the data from their website.
- DODO/H datasets:
- Preprocessing of the data:
- DODO/H datasets:
- Preprocess the data using the prepare_dodh.py and prepare_dodo.py scripts (see below for details on the parameters of the scripts).
- Copy all preprocessed datafiles to the
cache/dod_o_hdirectory.
- Sleep-EDFx dataset:
- Preprocess the data using the prepare_sleepedfx.py script (see below for details on the parameters of the script).
- ISRUC dataset:
- Preprocess the data using the prepare_isruc.py script (see below for details on the parameters of the script).
- DODO/H datasets:
- Follow the instructions given in the manual files of the experiments to pretrain and fine-tune the models.
- For Figure 2, you need the results of exp001-exp006. The manual for exp001 can be found at config/exp001/manual.md. The manuals of other experiments follow the same structure.
- For Figure 3, you need the results of exp004-exp006. The manual for exp004 can be found at config/exp004/manual.md. The manuals of other experiments follow the same structure.
- For Figure 4, you need the results of exp001 and exp007. The manual for exp007 can be found at config/exp007/manual.md.
- Evaluation on the test set is also described in the manual files.
The preprocessing directory contains scripts used to preprocess the data (in our case, the dreem (DODO/H),
sleepedfx, and isruc datasets).
Downloads the EEG signals from the Dreem dataset to ~/data/dreem.
Sample call: python preprocessing/dreem/download_data.py
Preprocessing script for the DODH dataset.
Arguments:
-sor--signals_dir: path to the directory containing the EEG signals-aor--annotations_dir: path to the directory containing the annotations-oor--output_dir: path to the directory where the preprocessed data should be saved; default iscache/dodh
Sample call: python preprocessing/dreem/prepare_dodh.py -s ~/data/dreem -a ~/data/dreem -o cache/dodh
Preprocessing script for the DODO dataset.
Arguments:
-sor--signals_dir: path to the directory containing the EEG signals-aor--annotations_dir: path to the directory containing the annotations-oor--output_dir: path to the directory where the preprocessed data should be saved; default iscache/dodo
Sample call: python preprocessing/dreem/prepare_dodo.py -s ~/data/dreem -a ~/data/dreem -o cache/dodo
Preprocessing script for the Sleep-EDFx dataset.
Arguments:
-dor--data_dir: path to the directory containing the EEG signals (*PSG.edf) and the corresponding annotations (*Hypnogram.edf)-oor--output_dir: path to the directory where the preprocessed data should be saved; default iscache/sleep-edfx
Sample call:
python preprocessing/sleepedfx/prepare_sleepedfx.py -d ~/data/sleepedfx -o cache/sleep-edfx
Preprocessing script for the ISRUC dataset.
Arguments:
-sor--signals_dir: path to the directory containing the folder structure from the ISRUC website with the EEG signals (.rec) and the corresponding annotations (.txt)-oor--output_dir: path to the directory where the preprocessed data should be saved; default iscache/isruc
Sample call: python preprocessing/isruc/prepare_isruc.py -s ~/data/isruc -o cache/isruc
All training and evaluation scripts can be found in the scripts directory.
The scripts require configuration files, which are expected to be located in the config directory (see section "
Configuration" for details).
Performs pretraining as specified in the corresponding configuration file, writes its log to the console and saves a log
file and results to a result directory in the logs directory.
Model checkpoints are written to the models directory.
Arguments:
- this is a hydra-based script, which means that any configuration can be overwritten using command line arguments (see section "Configuration" for details)
-m: sets the script to themultirunmode (see section "Configuration" for details)-cn=<experiment group>/<sub-experiment>: name of experiment to run, for which a<sub-experiment>.yamlfile has to exist in theconfig/<experiment group>directory
Sample call (single run): python scripts/pretrain.py -cn=exp001/exp001b
Performs fine-tuning as specified in the corresponding configuration file, writes its log to the console and saves a log
file and results to a result directory in the logs directory.
Model checkpoints are written to the models directory.
Arguments:
- this is a hydra-based script, which means that any configuration can be overwritten using command line arguments (see section "Configuration" for details)
-m: sets the script to themultirunmode (see section "Configuration" for details)-cn=<experiment group>/<sub-experiment>: name of experiment to run, for which a<sub-experiment>.yamlfile has to exist in theconfig/<experiment group>directory
Sample call (single run): python scripts/fine-tune.py -cn=exp001/exp001a
First, performs pretraining, then fine-tuning as specified in the corresponding configuration file, writes its log to
the console and saves a log
file and results to a result directory in the logs directory.
Model checkpoints are written to the models directory.
Arguments:
- this is a hydra-based script, which means that any configuration can be overwritten using command line arguments (see section "Configuration" for details)
-m: sets the script to themultirunmode (see section "Configuration" for details)-cn=<experiment group>/<sub-experiment>: name of experiment to run, for which a<sub-experiment>.yamlfile has to exist in theconfig/<experiment group>directory
Sample call (single run): python scripts/pretrain_and_fine-tune.py -cn=exp001/exp001b
Evaluates a model as specified in the corresponding configuration file, writes its log to the console and saves a log
file and results to a result directory in the logs directory.
Arguments:
- this is a hydra-based script, which means that any configuration can be overwritten using command line arguments (see section "Configuration" for details)
-m: sets the script to themultirunmode (see section "Configuration" for details)-cn=<experiment group>/<sub-experiment>: name of experiment to run, for which a<sub-experiment>.yamlfile has to exist in theconfig/<experiment group>directory
Sample call (single
run):
python scripts/eval_fine-tuned.py -cn=exp001/exp001a +model.downstream.path='exp001b-base_fe_clas-2023-10-13_14-21-17-final.pth' +training.downstream.trainer.evaluators.test='${evaluators.downstream.test}' model.downstream.feature_extractor.path=null
Explanation of the sample call: The +model.downstream.path parameter specifies the path to the model checkpoint that
should
be evaluated.
The +training.downstream.trainer.evaluators.test parameter specifies the evaluator that should be used for evaluation.
In this case, we want to evaluate on the test set and use the test evaluator that was defined
in exp001/fpt_config.yaml under the key evaluators.downstream.test.
Since both the model path and the evaluator weren't part of the configuration before, we add them using the + prefix.
The last parameter model.downstream.feature_extractor.path=null is used to overwrite the feature extractor path, which
is not needed for evaluation because we always load the full model.
The scripts/visualization directory contains scripts used to create the figures used in our paper.
Creates the plot used in Figure 3 of the paper by reading the results of experiment groups exp004-exp006 from the logs
directory.
Sample call: python scripts/visualization/visualize_matrix_nepochs_vs_nsubjects_testdata_cv.py
Creates the plot used in Figure 4 of the paper by reading the results of experiment groups exp001 and exp007 from the
logs directory.
Sample call: python scripts/visualization/visualize_metrics_pretraining.py
Creates the plot used in Figure 2 of the paper by reading the results of experiment groups exp001-exp006 from the logs
directory.
Sample call: python scripts/visualization/visualize_nsubjects_vs_mf1_testdata_cv.py
The configuration of an experiment is implemented using the hydra framework that is based on YAML files. If you are not familiar with the hydra framework, you can find a good introduction and tutorial in the official documentation. This repository makes use of the object instantiation feature of hydra, which allows to instantiate objects at runtime based on the configuration files (see here for more details).
All configuration files must be placed within the config directory.
The configuration files are organized in a hierarchical structure, where the base configuration is defined in
config/base_config.yaml, the experiment-specific configurations are defined in the experiment folders (e.g.
config/exp001/fpt_config.yaml for exp001), and the
sub-experiment-specific configurations are defined in the folders of the sub-experiments (e.g.
config/exp001/exp001a.yaml for exp001a).
Configuration files that are lower in the hierarchy can overwrite parameters defined in higher-level configuration
files.
All configurations can be overwritten using command line arguments when running a script.
Any parameters or groups of parameters that should be None, have to be configured as either null or Null following
the YAML definition.
The available parameters are described in the existing configuration files and the doc-strings of the classes.
To get an overview of the final configuration of a run, it might be helpful to look into the .hydra folder in
the logs directory after running a script.