Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data preprocessing for VQA #56

Closed
runzeer opened this issue Mar 25, 2022 · 2 comments
Closed

data preprocessing for VQA #56

runzeer opened this issue Mar 25, 2022 · 2 comments
Assignees

Comments

@runzeer
Copy link

runzeer commented Mar 25, 2022

When will you release the data preprocessing codes for VQA-v2?

@yangapku yangapku self-assigned this Mar 26, 2022
@yangapku
Copy link
Member

yangapku commented Mar 26, 2022

Hi, if you just want to make inference on your custom QA sample and try open-domain VQA (not restricted in the 3,129 candidate answers in VQA-v2 dataset), I recommend to refer to the open-domain VQA Colab we provided (link) and follow how to pre-process raw samples to do inference. The open-domain VQA Colab uses pretrained OFA (rather than VQA-finetuned) and are able to answer more open-domain visual questions outside the 3,129 VQA-v2 candidate answers. Our VQA demo on Huggingface spaces are also based on pretrained OFA.

If you would like to work on VQA-v2 or other VQA-formulated competition datasets, please refer to the section "1. Prepare the Dataset & Checkpoints" of finetuning on VQA in the readme. It shows how to organize the TSV datafile to facilitate the VQA finetuning and inference on the finetuned VQA-v2 checkpoint. Specifically, the question-id, image-id and the question text (lowercased) are just derived from your original dataset. For the training samples, the answer text (also lowercased) is derived from the original dataset concatenated with its confidence using "|!+" (for example 0.6|!+no). For inference samples do not have ground-truth answers, just make a fake answer string as a placeholder (like 1.0|!+no). The object labels are not necessary and you can just leave it blank (or you should refer to VinVL repo to see how to obtain labels on custom data, we just employed its released labels on COCO & VG images). To transform images to base64 strings, please use the following code:

from PIL import Image
from io import BytesIO
import base64

img = Image.open(fn)
img_buffer = BytesIO()
img.save(img_buffer, format=img.format)
byte_data = img_buffer.getvalue()
base64_str = base64.b64encode(byte_data) # bytes
base64_str = base64_str.decode("utf-8") # str

For one sample, the columns mentioned above are concatenated using '\t' into a line. You can refer to the example in the readme. For the already pre-processed VQA-v2 dataset, you can directly download it in datasets.md.

If you have further questions, please do not hesitate to ask me.

@ricvolpi
Copy link

Hi, if you just want to make inference on your custom QA sample and try open-domain VQA (not restricted in the 3,129 candidate answers in VQA-v2 dataset), I recommend to refer to the open-domain VQA Colab we provided (link)

Hi, the Colab link seems broken. Is the notebook still up? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants