Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
Instructions for training and testing the "SMem-VQA Two-Hop" model:
-
Download the provided caffe folder and install caffe following the instructions in http://caffe.berkeleyvision.org/installation.html .
-
Download MSCOCO images, and VQA annotations and questions:
cd example/data/
./get_image.sh
-
Generate the hdf5 data for training and testing:
cd example/
python ./data/generate_h5_data/generate_h5_data.py
-
Train the model:
cd example/
run ./train/train_mm.sh
-
Model trained on VQA dataset: SMem-VQA
-
Predict the answers for the images and questions in VQA test-dev dataset:
cd example/
python ./prediction/predict_json.py
@inproceedings{xu2016ask,
title = {Ask, attend and answer: Exploring question-guided spatial attention for visual question answering},
author = {Xu, Huijuan and Saenko, Kate},
booktitle = {European Conference on Computer Vision},
year = {2016}
}