-
horizon robotics
- nanjing, china
Stars
Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
The devkit of the nuScenes dataset.
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
This repository contains the code of our paper 'Skip \n: A simple method to reduce hallucination in Large Vision-Language Models'.
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine…
[CVPR 2024] The official pytorch implementation of "A General and Efficient Training for Transformer via Token Expansion".
[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型