Meta Flow Matching (MFM) is a practical approach to integrating along vector fields on the Wasserstein manifold by amortizing the flow model over the initial distributions. Current flow-based models are limited to a single initial distribution/population and a set of predefined conditions which describe different dynamics.
In natural sciences, multiple processes can be represented as vector fields on the Wasserstein manifold of probability densities - i.e. the change of the population at any moment in time depends on the population itself due to the interactions between samples/particles. One domain of applications is personalized medicine, where the development of diseases and the respective effect/response of treatments depend on the microenvironment of cells specific to each patient.
In MFM, we jointly train a vector field model
This repo contains all elements needed to reproduce our results. See this http link for the paper.
The preprocessed data can be downloaded here: Preprocessed organoid data
The raw data can be downloaded here: Raw organoid data. For usability, we provide the notebooks trellis_data_replica_splits.ipynb and trellis_data_3_pdo_splits.ipynb which contain further dataset details, code for the data preprocessing, and code for generating the replica and patient splits.
If you find this code useful in your research, please cite our work.
@article{atanackovic2024meta,
title={Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold},
author={Lazar Atanackovic and Xi Zhang and Brandon Amos and Mathieu Blanchette and Leo J. Lee and Yoshua Bengio and Alexander Tong and Kirill Neklyudov},
year={2024},
eprint={2408.14608},
archivePrefix={arXiv},
}
Install dependencies
# clone project
git clone https://github.com/lazaratan/meta-flow-matching.git
cd meta-flow-matching
# [OPTIONAL] create conda environment
conda create -n mfm python=3.9
conda activate mfm
# install pytorch according to instructions
# https://pytorch.org/get-started/
# install requirements
pip install -r requirements.txt
Train model with chosen experiment configuration from src.conf/experiment/
python train.py experiment=experiment_name.yaml
You can override any parameter from command line like this
python train.py experiment=experiment_name.yaml trainer.max_epochs=1234 seed=42
To train a model via MFM on the synthetic letters setting, use
python train.py experiment=letters_mfm.yaml
To run the biological experiments, first download the preprocessed data here. Then, similar to the synthetic letters experiment, executing
python train.py experiment=trellis_mfm.yaml
will train 1 seed of an MFM model on the organoid drug-screen dataset.
To replicate an experiment, for example, the last row of Table 1 (in the paper), you can use the multi-run feature:
python train.py -m experiment=letters_mfm.yaml seed=1,2,3
Have a question? Found a bug? Missing a specific feature? Feel free to file a new issue, discussion or PR with respective title and description.
Before making an issue, please verify that:
- The problem still exists on the current
main
branch. - Your python dependencies are updated to recent versions.
Suggestions for improvements are always welcome!