Stars
Offical implementation of "Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection"
Offical implementation of "Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection"
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek3, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inter…
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Official Code for 'TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction'
Official PyTorch Code for "ATPrompt: Textual Prompt Learning with Embedded Attributes"
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
A Survey on Benchmarks of Multimodal Large Language Models
Official repository of the paper "MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation"
Official implement of ICML2024 Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Official code for "SRFormer: Permuted Self-Attention for Single Image Super-Resolution" (ICCV 2023) and SRFormerV2
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
[ECCV 2024] Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
EVA Series: Visual Representation Fantasies from BAAI
The official VOT Challenge evaluation and analysis toolkit
[ECCV'18] Long-term Tracking in the Wild: A Benchmark
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Visual tracking library based on PyTorch.
LAVIS - A One-stop Library for Language-Vision Intelligence
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception