pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge by Teney et al.
- To download and extract vqav2, glove, and pretrained visual features:
bash scripts/download_extract.sh
- To prepare data for training:
python scripts/preproc.py
- The structure of
data/
directory should look like this:- data/ - zips/ - v2_XXX...zip - ... - glove...zip - trainval_36.zip - glove/ - glove...txt - ... - v2_XXX.json - ... - trainval_resnet...tsv (The above are files created after executing scripts/download_extract.sh) - tokenizers/ - ... - dict_ans.pkl - dict_q.pkl - glove_pretrained_300.npy - train_qa.pkl - val_qa.pkl - train_vfeats.pkl - val_vfeats.pkl (The above are files created after executing scripts/preproc.py)
Use default parameters:
bash scripts/train.sh
- Huge re-factor (especially data preprocessing), tested based on pytorch 0.4.1 and python 3.6
- Training for 20 epochs reach around 50% training accuracy. (model seems buggy in my implementation)
- After all the preprocessing,
data/
directory may be up to 38G+ - Some of
preproc.py
andutils.py
are based on this repo