✨✨Latest Advances on Multimodal Large Language Models
-
Updated
Oct 29, 2024
✨✨Latest Advances on Multimodal Large Language Models
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.
Gamified Adversarial Prompting (GAP): Crowdsourcing AI-weakness-targeting data through gamification. Boost model performance with community-driven, strategic data collection
[EMNLP 2024] A Video Chat Agent with Temporal Prior
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Vistral-V: Visual Instruction Tuning for Vistral - Vietnamese Large Vision-Language Model.
Mistral assisted visual instruction data generation by following LLaVA
A collection of visual instruction tuning datasets.
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
Add a description, image, and links to the visual-instruction-tuning topic page so that developers can more easily learn about it.
To associate your repository with the visual-instruction-tuning topic, visit your repo's landing page and select "manage topics."