Strong baseline for visual question answering

This is a re-implementation of Vahid Kazemi and Ali Elqursh's paper Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering in PyTorch.

The paper shows that with a relatively simple model, using only common building blocks in Deep Learning, you can get better accuracies than the majority of previously published work on the popular VQA v1 dataset.

A fully trained model (convergence shown below) is available for download.

Note that the model in my other VQA repo performs better than the model implemented here.

This project uses the code provided here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Strong baseline for visual question answering

This project uses the code provided here

Files

README.md

Latest commit

History

README.md

File metadata and controls

Strong baseline for visual question answering

This project uses the code provided here