A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！

Python 2,829 173 Updated Nov 1, 2024

Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,817 128 Updated Oct 23, 2024

kyegomez / NaViT

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

Python 180 9 Updated Oct 7, 2024

nutonomy / nuscenes-devkit

The devkit of the nuScenes dataset.

Python 2,273 628 Updated Sep 30, 2024

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

Python 811 80 Updated Sep 27, 2024

CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 206 9 Updated Oct 22, 2024

baaivision / DenseFusion

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Python 114 1 Updated Sep 27, 2024

WeihuangLin / INF-LLaVA

INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

Python 40 Updated Aug 4, 2024

google / haloquest

Jupyter Notebook 8 Updated Aug 2, 2024

baaivision / EVE

[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models

Python 220 3 Updated Oct 2, 2024

OpenGVLab / Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Python 357 13 Updated Oct 31, 2024

NX-AI / vision-lstm

xLSTM as Generic Vision Backbone

Python 426 28 Updated Oct 16, 2024

h-zhao1997 / cobra

Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference

Python 254 8 Updated Aug 19, 2024

OpenGVLab / VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Python 828 60 Updated Jul 6, 2024

ParadoxZW / LLaVA-UHD-Better

A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo

Python 31 3 Updated Aug 12, 2024

hanmenghan / Skip-n

This repository contains the code of our paper 'Skip \n: A simple method to reduce hallucination in Large Vision-Language Models'.

Python 11 Updated Feb 12, 2024

opendatalab / HA-DPO

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Python 63 5 Updated Jan 30, 2024

DAMO-NLP-SG / VCD

[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Python 201 9 Updated Oct 7, 2024

bytedance / MTVQA

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine…

Python 44 1 Updated Sep 29, 2024

Osilly / TokenExpansion

[CVPR 2024] The official pytorch implementation of "A General and Efficient Training for Transformer via Token Expansion".

Python 40 2 Updated Apr 22, 2024

USC-GVL / Agent-Driver

A Language Agent for Autonomous Driving

Python 229 9 Updated Mar 21, 2024

OpenDriveLab / UniAD

[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving

Python 3,476 385 Updated Aug 28, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,739 113 Updated Oct 30, 2024

PointsCoder / GPT-Driver

Learning to Drive with GPT

Python 235 12 Updated Feb 1, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,904 459 Updated Oct 29, 2024

alibaba / AICITY2024_Track2_AliOpenTrek_CityLLaVA

Python 32 3 Updated Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zhikaizhang laserwave

Achievements