Discovering State Variables Hidden in Experimental Data
Boyuan Chen,
Kuang Huang,
Sunand Raghupathi,
Ishaan Chandratreya,
Qiang Du,
Hod Lipson
Columbia University
This repo contains the PyTorch implementation for paper "Discovering State Variables Hidden in Experimental Data".
If you find our paper or codebase helpful, please consider citing:
@article{chen2021discover,
title={Discovering State Variables Hidden in Experimental Data},
author={Chen, Boyuan and Huang, Kuang and Raghupathi, Sunand and Chandratreya, Ishaan and Du, Qiang and Lipson, Hod},
journal={arXiv preprint arXiv:2112.10755},
year={2021}
}
- Installation
- Logging
- Data Preparation
- Training and Testing
- Intrinsic Dimension Estimation
- Long-term Prediction and Stability Evaluation
- Evaluation and Analysis
- License
The installation has been test on Ubuntu 18.04 with CUDA 11.0. All the experiments are performed on one GeForce RTX 2080 Ti Nvidia GPU.
Create a python virtual environment and install the dependencies.
virtualenv -p /usr/bin/python3.6 env3.6
source env3.6/bin/activate
pip install -r requirements.txt
Note: You may need to install Matlab on your computer to use some of the data collectors and intrinsic dimension estimation algorithms.
We first introduce the naming convention of the saved files so that it is clear what will be saved and where they will be saved.
- Log folder naming convention:
logs_{dataset}_{model_name}_{seed}
- Inside the logs folder, the structure and contents are:
\logs_{dataset}_{model_name}_{seed} \lightning_logs \checkpoints [saved checkpoint] \version_0 [training stats] \version_1 [testing stats] \predictions [testing predicted images] \prediction_long_term [long term predicted images] \variables [file id and latent vectors on testing data] \variables_train [file id and latent vectors on training data] \variables_val [file id and latent vectors on validation data]
We provide nine datasets with their own download links below.
- circular_motion (circular motion system)
- reaction_diffusion (reaction diffusion system)
- single_pendulum (single pendulum system)
- double_pendulum (rigid double pendulum system)
- elastic_pendulum (elastic double pendulum system)
- swingstick_non_magnetic (swing stick system)
- air_dancer (air dancer system)
- lava_lamp (lava lamp system)
- fire (fire system)
Save the downloaded dataset as data/{dataset_name}
, where data
is your customized dataset folder. Please make sure that data
is an absolute path and you need to change the data_filepath
item in the config.yaml
files in configs
to specify your customized dataset folder.
Please refer to the datainfo folder for more details about data structure and dataset collection process.
Our approach involves three models:
- dynamics predictive model (encoder-decoder / encoder-decoder-64)
- latent reconstruction model (refine-64)
- neural latent dynamics model (latentpred)
-
Navigate to the scripts folder
cd scripts
-
Train the dynamics predictive model (encoder-decoder and encoder-decoder-64) and then save the high-dimensional latent vectors from the testing data.
./encoder_decoder_64_train.sh {dataset_name} {gpu no.} ./encoder_decoder_train.sh {dataset_name} {gpu no.} ./encoder_decoder_64_eval.sh {dataset_name} {gpu no.} ./encoder_decoder_eval.sh {dataset_name} {gpu no.}
-
Run forward pass on the trained encoder-deocer model and encoder-decoder-64 model to save the high-dimensional latent vectors from the training and validation data. The saved latent vectors will be used as the training and validataion data for training and validating the latent reconstruction model.
./encoder_decoder_eval_gather.sh {dataset_name} {gpu no.} ./encoder_decoder_64_eval_gather.sh {dataset_name} {gpu no.}
-
Before you proceed this step, please refer to the next section to obtain the system's intrinsic dimension and then come back to this step.
Train the latent reconstruction model (refine-64) with the saved 64-dim latent vectors from previous steps. Then save the obtained Neural State Variables from both training and testing data.
./refine_64_train.sh {dataset_name} {gpu no.} ./refine_64_eval.sh {dataset_name} {gpu no.} ./refine_64_eval_gather.sh {dataset_name} {gpu no.}
-
Train the neural latent dynamics model (latentpred) with the trained models from previous steps.
./latentpred_train.sh single_pendulum {dataset_name} {gpu no.}
With the trained dynamics predictive model, our approach provides subroutines to estimate the system's intrinsic dimension (ID) using manifold learning algorithms. The estimated intrinsic dimension will be used to decide the number of Neural State Variables and to design the latent reconstruction model. Only after this step you can proceed to train the latent reconstruction model to obtain Neural State Variables.
-
Navigate to the scripts folder
cd scripts
which is the default directory saving all models' log folders.
-
Estimate the intrinsic dimension from the saved latent vectors of the encoder-decoder models for all random seeds.
./encoder_decoder_estimate_dimension.sh {dataset_name}
-
Calculate the final intrinsic dimension estimated values (mean and standard deviation).
python ../utils/dimension.py {dataset_name}
With all above trained models, our approach offers system long-term predictions through model rollouts as well as stability evaluation of the long-term predictions.
-
Navigate to the scripts folder
cd scripts
-
Long-term prediction with single model rollouts.
./encoder_decoder_long_term_model_rollout.sh {dataset_name} {gpu no.} ./encoder_decoder_64_long_term_model_rollout.sh {dataset_name} {gpu no.} ./refine_64_long_term_model_rollout.sh {dataset_name} {gpu no.}
The predictions will be saved in the
prediction_long_term
subfolder under the model's log folder. -
Long-term prediction with hybrid model rollouts.
./refine_64_long_term_hybrid_rollout.sh {dataset_name} {gpu no.} {step}
where
step
is the number of model rollouts via 64-dim latent vectors before a model rollout via Neural State Variables.The predictions will be saved in the
prediction_long_term
subfolder under the refine-64 model's log folder. -
Long-term prediction with single model rollouts from perturbed initial frames.
./encoder_decoder_long_term_model_rollout_perturb_all.sh {dataset_name} {gpu no.} ./encoder_decoder_64_long_term_model_rollout_perturb_all.sh {dataset_name} {gpu no.} ./refine_64_long_term_model_rollout_perturb_all.sh {dataset_name} {gpu no.}
The predictions will be saved in the
prediction_long_term
subfolder under the model's log folder. -
Stability evaluation on long-term predictions with single model rollouts
./long_term_eval_stability.sh {dataset_name} {gpu no.}
and on long-term predictions with hybrid model rollouts
./long_term_eval_stability_hybrid.sh {dataset_name} {gpu no.} {step}
where
step
is as mentioned above for hybrid model rollouts.The evaluated latent space errors measuring the prediction stability will be saved in the
stability.npy
file under the respectiveprediction_long_term
folders.
Note: The default long-term prediction length is 60 (in frames). You will need to modify the scripts if you want to use a different prediction length.
Please refer to the analysis folder for detailed instructions for physical evaluation and analysis.
This repository is released under the MIT license. See LICENSE for additional details.