Pytorch implementation of Generative Multisensory Network (GMN) on our paper:
Jae Hyun Lim, Pedro O. Pinheiro, Negar Rostamzadeh, Christopher Pal, Sungjin Ahn, Neural Multisensory Scene Inference (2019)
Please check out our project website!
python>=3.6
pytorch==0.4.x
tensorflow
(for tensorboardX)
tensorboardX
data from MESE
data
: data folderdatasets
: dataloader definitionsmodels
: model definitionsutils
: miscelleneous functionscache
: temporary fileseval
: a set of python codes for evaluation / visualizationscripts
: scripts for experiments├── eval: eval/visualization scripts are here ├── train: training codes are here └── train_missing_modalities: training with missing modalities are here ├── m5 ├── m8 └── m14
main_multimodal.py
: main function to train model
- For example, you can train an APoE model for vision and haptic data (# of modalities = 2) as follows,
For more information, please find example scripts in
python main_multimodal.py \ --dataset haptix-shepard_metzler_5_parts \ --model conv-apoe-multimodal-cgqn-v4 \ --train-batch-size 12 --eval-batch-size 4 \ --lr 0.0001 \ --clip 0.25 \ --add-opposite \ --epochs 10 \ --log-interval 100 \ --exp-num 1 \ --cache experiments/haptix-m2
scripts/train
folder.
- An example script to run classification with a learned model on held-out date can be written as follows:
For the additional Shepard-Metzler objects with 4 or 6 parts (), 10-way classification.For more information, please find example scripts inpython eval/clsinf_multimodal_m2.py \ --dataset haptix-shepard_metzler_46_parts \ --model conv-apoe-multimodal-cgqn-v4 \ --train-batch-size 10 --eval-batch-size 10 \ --vis-interval 1 \ --num-z-samples 50 \ --mod-step 1 \ --mask-step 1 \ --cache clsinf.m2.s50/rgb/46_parts \ --path <path-to-your-model>
scripts/eval
folder.
- If you would like to run an APoE model for where , run following script,
For more information, please find example scripts in
python main_multimodal.py \ --dataset haptix-shepard_metzler_5_parts-48-ul-lr-rgb-half-intrapol1114 \ --model conv-apoe-multimodal-cgqn-v4 \ --train-batch-size 9 --eval-batch-size 4 \ --lr 0.0001 \ --clip 0.25 \ --add-opposite \ --epochs 10 \ --log-interval 100 \ --cache experiments/haptix-m14-intrapol1114
scripts/train_missing_modalities
folder.
For questions and comments, feel free to contact Jae Hyun Lim.
MIT License
@article{jaehyun2019gmn,
title = {Neural Multisensory Scene Inference},
author = {Jae Hyun Lim and
Pedro O. Pinheiro and
Negar Rostamzadeh and
Christopher J. Pal and
Sungjin Ahn},
journal = {arXiv preprint arXiv:1910.02344},
year = {2019},
}