GitHub - aishuse/Visual-Question-Answering

Visual Question Answering (VQA) - Abstract Scenes

This project implements a Visual Question Answering (VQA) system using the VQA Abstract Scenes Dataset. It takes a cartoon-style image and a natural language question about the image, and predicts an answer.

🔍 Dataset

Name: VQA Abstract Scene Dataset (v2)

Source: VQA Dataset

The dataset contains synthetic scene images built from clipart objects, designed to test reasoning over structured visual scenes.

🧠 Model Architecture

The model uses a combination of:

Image Encoder: Frozen ResNet50 with Global Average Pooling and a Dense Layer to extract image features (input size 224x224).

LSTM-based Question Encoder: Pretrained word embeddings are passed into stacked LSTMs to create a Question Context Vector.

Cross-modal Attention: Multi-Head Cross-Modal Attention is used to align image and question representations.

Fusion Layer: Combines attended image features, global image features, and the question summary.

Classifier: Fully-connected Dense layers (512 → 256) followed by a Softmax over 1,000 answer classes.

The model is trained using the Categorical Crossentropy loss and the Adam optimizer. Checkpointing is used to save model weights after each epoch.

⚙️ Hyperparameters

Loss Function: Categorical Crossentropy

Optimizer: Adam

Evaluation Metric: Accuracy

Batch Size: 128 (initial), 32 (for fine-tuning)

Learning Rate: 0.00001

📊 Best Model Performance

Best Epoch: 24 (out of 30 total)

Exact Match: 27.55%

Partial Match: 41.21%

Validation Accuracy: 51.8%

Validation Loss: 1.7154

Training Accuracy: 54.76%

Training Loss: 1.5631

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
VQA.ipynb		VQA.ipynb
VQAProject_Test.ipynb		VQAProject_Test.ipynb
VQA_EDA.ipynb		VQA_EDA.ipynb
VQA_Model.ipynb		VQA_Model.ipynb
app.py		app.py
full_model_new_24.keras		full_model_new_24.keras
glove_vectors.pkl		glove_vectors.pkl
labelencoder.pkl		labelencoder.pkl
requirements.txt		requirements.txt
tokenizer.pkl		tokenizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

aishuse/Visual-Question-Answering

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages