Skip to content
View laserwave's full-sized avatar
  • horizon robotics
  • nanjing, china

Block or report laserwave

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Python 17,680 1,223 Updated Oct 31, 2024

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

Python 696 45 Updated Oct 29, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,875 170 Updated Oct 4, 2024

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Python 1,209 59 Updated Oct 18, 2022

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Python 2,829 173 Updated Nov 1, 2024

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,817 128 Updated Oct 23, 2024

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

Python 180 9 Updated Oct 7, 2024

The devkit of the nuScenes dataset.

Python 2,273 628 Updated Sep 30, 2024

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

Python 811 80 Updated Sep 27, 2024

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 206 9 Updated Oct 22, 2024

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Python 114 1 Updated Sep 27, 2024

INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

Python 40 Updated Aug 4, 2024
Jupyter Notebook 8 Updated Aug 2, 2024

[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models

Python 220 3 Updated Oct 2, 2024

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Python 357 13 Updated Oct 31, 2024

xLSTM as Generic Vision Backbone

Python 426 28 Updated Oct 16, 2024

Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference

Python 254 8 Updated Aug 19, 2024

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Python 828 60 Updated Jul 6, 2024

A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo

Python 31 3 Updated Aug 12, 2024

This repository contains the code of our paper 'Skip \n: A simple method to reduce hallucination in Large Vision-Language Models'.

Python 11 Updated Feb 12, 2024

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Python 63 5 Updated Jan 30, 2024

[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Python 201 9 Updated Oct 7, 2024

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine…

Python 44 1 Updated Sep 29, 2024

[CVPR 2024] The official pytorch implementation of "A General and Efficient Training for Transformer via Token Expansion".

Python 40 2 Updated Apr 22, 2024

A Language Agent for Autonomous Driving

Python 229 9 Updated Mar 21, 2024

[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving

Python 3,476 385 Updated Aug 28, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,739 113 Updated Oct 30, 2024

Learning to Drive with GPT

Python 235 12 Updated Feb 1, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,904 459 Updated Oct 29, 2024
Next