[CVPR 2025] Code for "Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering".
knowledge retrieval vqa cvpr multimodal-learning visual-question-answering gradcam rag llm large-language-model mllm llava retrieval-augmented-generation llava-next cvpr2025 mllm-reasoning multimodal-large-language-model knowledge-based-visual-question-answering kb-vqa
-
Updated
Jun 16, 2025 - Python