Skip to content

Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Notifications You must be signed in to change notification settings

UMass-Foundation-Model/VisualCoT

Repository files navigation

Code for paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Overall framework

framework

Preprocess datasets

  • Coco dataset 2014 and 2017
  • Download OK-VQA and AOK-VQA dataset, following the PICa format
  • Run preprocess script (preprocess/preprocess_aokvqa.sh for AOK-VQA and preprocess/preprocess_okvqa.sh) for OK-VQA
  • Make training object similarity file (object_similarity/object_similarity_aokvqa.sh for AOK-VQA and object_similarity/object_similarity_okvqa.sh for OK-VQA)

Prepare Scene graph and captions

  • Before running experiments, VisualCoT also need scene graph and captions, including three files for each input image (under input_text/scene_graph_text/scene_graph_coco17, input_text/scene_graph_text/scene_graph_coco17_attr, and input_text/scene_graph_text/scene_graph_coco17_caption). We have provided an example of image No.57 under each dir. Please follow the format of the examples and get scene graphs for all other images.
  • If you do not want to inference a scene graph model to get the scene graphs, here we provide the scene graphs and captions we generated (need additional process to match the format of above three examples):

Run experiments

  • run_aokvqa.sh for AOK-VQA
  • run_okvqa.sh for OK-VQA

Main Results

Backbone OK-VQA test (DA) AOK-VQA val (DA) AOK-VQA test (DA)
OPT-66B 44.6 46.4 46.0
Llama-2-70B 54.9 50.5 54.4

Cite

arXiv version

@article{chen2023see,
  title={Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning},
  author={Chen, Zhenfang and Zhou, Qinhong and Shen, Yikang and Hong, Yining and Sun, Zhiqing and Gutfreund, Dan and Gan, Chuang},
  journal={arXiv preprint arXiv:2301.05226},
  year={2023}
}

About

Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published