mllms

Here are 30 public repositories matching this topic...

yuanze-lin / Olympus

[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

chatbot pytorch deeplearning multimodal multi-modality foundation-models llms chatgpt instruction-tuning vision-language-model llava mllms

Updated May 31, 2025
Python

UCSC-VLAA / MedTrinity-25M

Star

[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“

dataset multimodality mllms

Updated Jul 11, 2025
Python

wanghao9610 / X-SAM

Star

[AAAI2026] X-SAM: From Segment Anything to Any Segmentation

sam segmentation mllms

Updated Nov 28, 2025
Python

InternRobotics / G2VLM

Star

G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning

3d-reconstruction spatial-reasoning mllms spatial-intelligence 3d-llms spatial-understanding

Updated Nov 27, 2025
Python

sun-hailong / TVC

Star

[ACL 2025] The code repository for "Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning" in PyTorch.

reasoning r1 cot forgetting mllms multimodel-large-language-model

Updated May 16, 2025
Python

aim-uofa / Omni-R1

Star

[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

rl omnimodal mllms grpo neurips-2025

Updated Dec 3, 2025
Python

XduSyL / EventGPT

Star

🔥[CVPR2025] EventGPT: Event Stream Understanding with Multimodal Large Language Models

chatbot representation-learning mutilmodel foundation-models llm mllms event-stream-understanding eventgpt event-language

Updated Jul 26, 2025
Python

WayneJin0918 / SRUM

Star

Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models". A post-training framework that creates a cost-effective, self-iterative optimization loop.

bagel post-training unified-model diffusion-models llm generative-ai mllms

Updated Nov 26, 2025
Python

aim-uofa / SegAgent

Star

[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

agent segment-anything vlms mllms

Updated Aug 8, 2025
Python

HVision-NKU / GlimpsePrune

Star

Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"

inference-efficiency lvlms mllms visual-token-pruning token-compression

Updated Sep 10, 2025
Python

[NeurIPS25 & ICML25 Workshop on Reliable and Responsible Foundation Models] A Simple Baseline Achieving Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1. Paper at: https://arxiv.org/abs/2503.10635

attack adversarial-attack lvlms mllms