Data Splits:
--train [str,str,...]: use the splits (separated by comma) in training.
--valid [str,str,...]: use the splits (separated by comma) in validation.
--test [str,str,...]: use the splits (separated by comma) in testing.
Model Architecture:
--llayers [int]: number of layers in language encoder.
--xlayers [int]: number of layers in cross-modality encoder.
--rlayers [int]: number of layers in object relationship encoder.
Load Weights:
--load [str='path/to/saved_model']: load fine-tuned model path/to/saved_model.pth.
--loadLXMERT [str='path/to/saved_model']: load pre-trained model without answer heads from path/to/saved_model_LXRT.pth.
--loadLXMERTQA [str='path/to/saved_model']: load pre-trained model with answer head path/to/saved_model_LXRT.pth.
--fromScratch: If none of the above loading parameters are set, the default mode would
load the pre-trained BERT weights.
As we promised to EMNLP reviewers, the language encoder would be re-initialized with this one-line argument to test the performance without BERT weights.
Training Hyper Parameters:
--batchSize [int]: batch size.
--optim [str]: optimizers.
--lr [float]: peak learning rate.
--epochs [int]: training epochs.
--tiny: Load 512 images for each data split. (Note: number of images might be changed due to dataset specification)
--fast: Load 5000 images for each data split. (Note: number of images might be changed due to dataset specification)
Pre-training Tasks:
--taskMaskLM: use the masked language model task.
--taskObjPredict: use the masked object prediction task.
--taskMatched: use the cross-modality matched task.
--taskQA: use the image QA task.
Visual Pre-training Losses (Tasks):
--visualLosses [str,str,...]: The sub-tasks in pre-training visual modality. Each one is from 'obj,attr,feat'.
obj: detected-object-label classification.
attr: detected-object-attribute classification.
feat: RoI-feature regression.
Mask Rate in Pre-training:
--wordMaskRate [float]: The prob of masking a word.
--objMaskRate [float]: The prob of masking an object.
--fromScratch: The default mode would load the pre-trained BERT weights into the model.
As we promised to EMNLP reviewers, this option would re-initialize the language encoder.