- Hangzhou
Stars
Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)
OVMR: Open-Vocabulary Recognition with Multi-Modal References (CVPR24)
Images to inference with no labeling (use foundation models to train supervised models).
collection of diffusion model papers categorized by their subareas
This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
LangGPT: Empowering everyone to become a prompt expert!🚀 Structured Prompt,Language of GPT, 结构化提示词,结构化Prompt
A collection of awesome image inpainting studies.
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
[IEEE TPAMI-2024] Pair then Relation: Pair-Net for Panoptic Scene Graph Generation
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
[WACV 2025] Python implementation of Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Collection of AWESOME vision-language models for vision tasks
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
[Mamba-Survey-2024] Paper list for State-Space-Model/Mamba and it's Applications
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull request…
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
An official codebase of Scene-Aware Label Graph Learning for Multi-Label Image Classification, ICCV 2023.
VMamba: Visual State Space Models,code is based on mamba
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
Awesome List of Attention Modules and Plug&Play Modules in Computer Vision