LRP ImageCaptioning Pytorch

Paper: Explain and improve: LRP-inference fine-tuning for image captioning models (Link)

LRP ImageCaptioning Pytorch

This is a Pytorch implementation of the latest version of Understanding Image Captioning Model beyond Visualizing Attention

What can we do with this repo

To train image captioning models with two kinds of attention mechanisms, adaptive attention, and multi-head attention.
To get both image explanations and linguistic explanations for a predicted word using LRP, Grad-CAM, Guided Grad-CAM, and GuidedBackpropagation.
To fine-tune a pre-trained image captioning model with LRP-inference fine-tuning to improve the mAP of frequent object words.

Requirements

python >=3.6 pytorch =1.4.0

Dataset Preparation

Flickr30K

We prepare the Flick30K as the Karpathy split.

MSCOCO2017

We select 110000 images from the training set for training and 5000 images from the training set for validation. The original validation set is used for testing.

The vocabulary is built on the training set for both datasets. Each caption is encoded with a <start> token at the beginning and an <end> token at the end. For the words that appear less than 3/4 time for Flicker30K and MSCOCO2017, we encode them with an <unk> token.

To build the vocabulary and encode the reference captions, please refer to preparedataset.py.

Feature extraction

This repo experiments with both the CNN features and the bottom-up features. The CNN features are extracted from the pre-trained VGG16 on ImageNet. We follow the py-bottom-up-attention to extract 36 bottom-up features per image for training.

To Train Models From Scratch

We train the image captioning models with two attention mechanisms, the adaptive attention with an LSTM layer as the predictor and multi-head attention with an FC layer as the predictor. The two models are defined in gridTDmodel.py and aoamodel.py respectively.

Pre-trained Models

Our pre-trained models can be downloaded here. Please email to sunjiamei.hit@gmail.com if you could not access them.

To Evaluate the Image Captioning Model

We evaluate the image captioning models using BLEU, SPICE, ROUGE, METEOR, and CIDER metrics. We also use BERT score. To generate these evaluations, we need to download the pycocoevalcap tools and copy the folders of different metrics under ./pycocoevalcap. We already provide the bert folder.

We provide three decoding methods:

greedy search
beam search
diverse beam search

To Explain Image Captioning Models

We provide LRP, GradCAM, Guided-GradCAM, and Guided Backpropagation to explain the image captioning models. These explanation methods are defined under the corresponding model files.

There are two stages of explanation. We first explain the decoder to get the explanation of each proceeding word and the encoded image features. We then explain the image encoder to obtain the image explanations.

To Fine-tune the Model with LRP Inference

We provide three optimization methods to optimize image captioning models trained with cross-entropy loss:

--cider_tune: the SCST optimization on a pre-trained model
--lrp_cider_tune: the lrp-inference SCST optimization
--lrp_tune: the lrp-inference finetune with cross-entropy loss

To Evaluate the Explanations

Please refer to the examples in evaluatioin.py. This will generate the results of our ablation experiment and correctness scores across various explanation methods. we need to download the COCOvalEntities.json file for calculating the correctness scores.

More visualizaiton examples

Acknowledgment

Many thanks to the works:

a-PyTorch-Tutorial-to-Image-Captioning

AoANet

py-bottom-up-attention

iNNvestigate

pycocoevalcap

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LRPtools		LRPtools
dataset		dataset
examples		examples
models		models
pycocoevalcap		pycocoevalcap
Readme.md		Readme.md
config.py		config.py
evaluation.py		evaluation.py
extract_bu_features.py		extract_bu_features.py
test.py		test.py
test_bu.py		test_bu.py
train.py		train.py
train_bu.py		train_bu.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LRP ImageCaptioning Pytorch

What can we do with this repo

Requirements

Dataset Preparation

Flickr30K

MSCOCO2017

Feature extraction

To Train Models From Scratch

Pre-trained Models

To Evaluate the Image Captioning Model

To Explain Image Captioning Models

To Fine-tune the Model with LRP Inference

To Evaluate the Explanations

More visualizaiton examples

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LRP ImageCaptioning Pytorch

What can we do with this repo

Requirements

Dataset Preparation

Flickr30K

MSCOCO2017

Feature extraction

To Train Models From Scratch

Pre-trained Models

To Evaluate the Image Captioning Model

To Explain Image Captioning Models

To Fine-tune the Model with LRP Inference

To Evaluate the Explanations

More visualizaiton examples

Acknowledgment

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages