More details about VQModel used in OFA? #396

YAOYI626 · 2023-06-05T14:15:33Z

Hi team,

Thanks for the really amazing work OFA! I want to know more about the VQ model used in OFA.

Does it share the same VQ model when doing different tasks like captioning or generation? How is the VQ model trained? @logicwong @JustinLin610

Thanks,
Xiaoyi

logicwong · 2023-06-06T05:16:55Z

@YAOYI626 Thanks for your interest.

The VQ model is exclusively employed for image infilling and generation. We discretize the raw image into a sequence of codes using the VQ model, and OFA learns to generate the codes based on the text descriptions or masked images.
For other tasks, like image captioning, we directly embed raw images into vectors via ResNet.
We utilize the pre-trained VQ model from here.

YAOYI626 · 2023-06-09T08:14:16Z

Hey @logicwong thanks for your reply!

Just curious, is there any specific reason doing captioning without VQ, Like big gap between captioning with VQ and captioning with embeddings from ResNet?

Thanks
Xiaoyi

logicwong · 2023-06-12T08:25:05Z

@YAOYI626 There are two main reasons:

Discretizing images with VQ results in a loss of information from the original image. In our preliminary experiments, using VQ resulted in a significant decrease in performance for the caption and VQA tasks.
We use a compression ratio of f8 to discretize images, which means that an image of 256x256 resolution will be discretized into a sequence of codes with a length of 1024. This will increase the training cost.

YAOYI626 · 2023-06-13T07:23:20Z

Thanks @logicwong for the helpful information. I'd like to close this issue.

YAOYI626 closed this as completed Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More details about VQModel used in OFA? #396

More details about VQModel used in OFA? #396

YAOYI626 commented Jun 5, 2023

logicwong commented Jun 6, 2023

YAOYI626 commented Jun 9, 2023 •

edited

Loading

logicwong commented Jun 12, 2023

YAOYI626 commented Jun 13, 2023

More details about VQModel used in OFA? #396

More details about VQModel used in OFA? #396

Comments

YAOYI626 commented Jun 5, 2023

logicwong commented Jun 6, 2023

YAOYI626 commented Jun 9, 2023 • edited Loading

logicwong commented Jun 12, 2023

YAOYI626 commented Jun 13, 2023

YAOYI626 commented Jun 9, 2023 •

edited

Loading