Skip to content
View alice-cool's full-sized avatar

Block or report alice-cool

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Agricultural datasets

R 124 39 Updated Dec 16, 2024

analysis of tomato leaf disease identification techniques

Python 2 Updated Apr 21, 2021
Python 11 1 Updated Nov 27, 2024

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

Python 69 4 Updated May 22, 2024
Python 289 30 Updated Jan 10, 2024

Official Implementation of NeurIPS 2024 paper "G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering""

Python 388 66 Updated Nov 15, 2024

LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment

Python 280 23 Updated Apr 29, 2024
Python 3 Updated Nov 28, 2024

[NeurIPS 2022] Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Python 99 10 Updated Sep 18, 2024

Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retrieval (Lerner et al., ECIR'24)

Python 31 2 Updated Dec 19, 2024

This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024

Python 817 58 Updated Nov 22, 2024

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Python 766 45 Updated Jul 29, 2024

A KBQA solution framework based on the agent-environment paradigm in the era of LLMs.

Python 19 2 Updated Aug 22, 2024
Python 7 Updated Dec 20, 2024

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration

Python 22 3 Updated Jan 1, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 7,160 500 Updated Feb 10, 2025

Official Implementation of our EMLNP 2024 Paper

Python 9 Updated Oct 24, 2024

LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning

Python 1,812 65 Updated Jan 22, 2025

Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples

Python 20 1 Updated Nov 27, 2024

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

6 Updated Nov 29, 2024

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA, AAAI 2022 (Oral)

Python 85 6 Updated Apr 10, 2022

Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Python 25 1 Updated Jul 21, 2024

[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

Jupyter Notebook 130 6 Updated Jul 18, 2024

Localized Symbolic Knowledge Distillation for Visual Commonsense Models (Neurips 2023]

Jupyter Notebook 5 Updated Dec 12, 2023

Official Code of "GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering"

Python 10 Updated Oct 10, 2024

Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge [ECCV'24]

Python 6 Updated Nov 20, 2024

[CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!

Python 14 1 Updated May 14, 2024

MC-CoT implementation code

Python 11 1 Updated Oct 31, 2024

[Arxiv 2024] Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

Python 8 1 Updated Jul 23, 2024
Next