[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"
-
Updated
May 31, 2025 - Python
[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"
[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“
[ACL 2025] The code repository for "Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning" in PyTorch.
[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
🔥[CVPR2025] EventGPT: Event Stream Understanding with Multimodal Large Language Models
[CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
[NeurIPS25 & ICML25 Workshop on Reliable and Responsible Foundation Models] A Simple Baseline Achieving Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1. Paper at: https://arxiv.org/abs/2503.10635
Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"
PDF解析工具:GOT的vLLM加速实现,MinerU做布局识别裁剪、GOT做表格公式解析,实现RAG中的pdf解析
【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
[ICCVW 2025 (Oral)] Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
On Path to Multimodal Generalist: General-Level and General-Bench
[NeurIPS'25] Backdoor Cleaning without External Guidance in MLLM Fine-tuning
[ICLR 2025] Breaking Mental Set to Improve Reasoning through Diverse Multi-Agent Debate
About Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-effective, self-iterative optimization loop.
The official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"
XpertEval: All-in-One Evaluation Framework for Multimodal Large Models
Add a description, image, and links to the mllms topic page so that developers can more easily learn about it.
To associate your repository with the mllms topic, visit your repo's landing page and select "manage topics."