Starred repositories
analysis of tomato leaf disease identification techniques
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
Official Implementation of NeurIPS 2024 paper "G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering""
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
[NeurIPS 2022] Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retrieval (Lerner et al., ECIR'24)
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
A KBQA solution framework based on the agent-environment paradigm in the era of LLMs.
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA, AAAI 2022 (Oral)
Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Localized Symbolic Knowledge Distillation for Visual Commonsense Models (Neurips 2023]
Official Code of "GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering"
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge [ECCV'24]
[CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
[Arxiv 2024] Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models