Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Add VQA V2.0 and Visual Dialog V0.9. #54

Merged
merged 9 commits into from
May 12, 2017
Merged

Add VQA V2.0 and Visual Dialog V0.9. #54

merged 9 commits into from
May 12, 2017

Conversation

jiasenlu
Copy link
Contributor

@jiasenlu jiasenlu commented May 8, 2017

This pull request contains
1: add vqa_v2 task loader.
2: add VisDial_v0.9 task loader.
3: disable download COCO image as default
4: move COCO image path to "--download_path"

To test the new added task

For VQA 2.0: python examples/display_data.py -t vqa_coco2014_v2 --download-path 'path_to_COCO_img'

For Visual Dialog: python examples/display_data.py -t visdial

Currently, the Visual Dialog inherit from the default "DialogTeacher" class. There is no placeholder for the additional image information. The "DialogData" class with the format [(x, y, r, c), new_episode?], could we extend this to [(x, y, r, c, i), new_episode?] where "i" is some optional additional information such as image_id? If so, I can send another pull request to modify that.

@facebook-github-bot
Copy link

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla - and if you have received this in error or have any questions, please drop us a line at cla@fb.com. Thanks!

If you are contributing on behalf of someone else (eg your employer): the individual CLA is not sufficient - use https://developers.facebook.com/opensource/cla?type=company instead. Contact cla@fb.com if you have any questions.

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@alexholdenmiller
Copy link
Member

Thanks Jiasen!! I think we'd like to keep the dataset downloading the images automatically (to datapath)--I know it's large, but users should only need to do it once anyways. Everything in downloads is also automatically downloaded (currently just the Memnn github repo).

As far as DialogTeacher/DialogData, I think we want to write a new one instead that includes some of the data loading from your vqa_coco2014 (v1) and prepares us to handle the preprocessing options that we talked about before. We don't want image_id as part of the action/observation dictionary--just images themselves--so we want to include the image loading as part of that data class.

@@ -36,7 +35,7 @@ def _path(opt):
annotation_path = os.path.join(opt['datapath'], 'VQA-COCO2014',
annotation_suffix + '_annotations.json')

image_path = os.path.join(opt['datapath'], 'VQA-COCO2014', img_suffix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, we'd like to keep the images in datapath

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then I think we might not want the image stay in 'VQA-COCO2014' folder, other task such as Visual Dialog may also use the COCO image. How about put the image under COCO-IMG ? Then, multiple task can share the image data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! Yeah giving it the most general name for the data file is perfect, and then multiple tasks can depend on it (and it won't rebuild it if it's already there).

anno = self.annotation['annotations'][self.episode_idx]
answers = [ans['answer'] for ans in anno['answers']]
else:
answers = ['fake_answer']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, just leave the labels field of the dict empty (accessing it gives KeyError, not None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks. will change this.

@facebook-github-bot
Copy link

@jiasenlu updated the pull request - view changes

Summary:

Fix the VQA, VisalDialog, VQA v2.0 based on last update of ParAI.

Test Plan:

python examples/display_data.py -t vqa_coco2014
python examples/display_data.py -t vqa_coco2014_v2
python examples/display_data.py -t visdial

Reviewers:

Subscribers:

Tasks:

Tags:

Blame Revision:
@facebook-github-bot
Copy link

@jiasenlu updated the pull request - view changes

@facebook-github-bot
Copy link

@jiasenlu updated the pull request - view changes

@facebook-github-bot
Copy link

@jiasenlu updated the pull request - view changes

Copy link
Member

@alexholdenmiller alexholdenmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking really good overall! how are you testing if the images are loading properly?

would be awesome to get a model in next so that we can try to train it and see if we can reproduce the results from the paper

@@ -265,7 +265,7 @@ def get(self, episode_idx, entry_idx=0):
table['reward'] = entry[2]
if len(entry) > 3:
table['label_candidates'] = entry[3]
if len(entry) > 4 and not opt.get('no_images', False):
if len(entry) > 4 and not self.opt.get('no_images', False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

self.len = len(self.ques['questions'])

class DefaultTeacher(OeTeacher):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does v2 have a multiple-choice version?

Copy link
Contributor Author

@jiasenlu jiasenlu May 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, VQA v2.0 doesn't have the multiple-choice now.

@jaseweston
Copy link
Contributor

jaseweston commented May 11, 2017 via email

@alexholdenmiller
Copy link
Member

alexholdenmiller commented May 11, 2017

simple example which prints ascii version of images (after pip install asciimatics), or we could write our own similar one

import sys
from asciimatics.renderers import ImageFile
img = ImageFile(sys.argv[1], height=30)
print(img)

maybe we want something like this in display_data?

@jiasenlu
Copy link
Contributor Author

Yeah, I think we should do that. Maybe we can add this in a separate pull request?

@alexholdenmiller alexholdenmiller merged commit 10453fd into facebookresearch:master May 12, 2017
@alexholdenmiller
Copy link
Member

Just pushed this to get that fix in

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants