Awesome Unified Multimodal Models
-
Updated
Mar 24, 2026
Awesome Unified Multimodal Models
A curated list of foundation models for vision and language tasks
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
[CVPR2026] Scaling Spatial Intelligence with Multimodal Foundation Models
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
A curated list of Awesome Personalized Large Multimodal Models resources
Video Search with CLIP
The official implementation of the paper "Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity".
The official implementation of the paper "Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts" (ICLR 2026).
[CVPR2026] ConsistCompose: Unified Multimodal Layout Control for Image Composition
Implementation of the paper "Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning", arXiv, 2025
Multimodal Bi-Transformers (MMBT) in Biomedical Text/Image Classification
NanoOWL Detection System enables real-time open-vocabulary object detection in ROS 2 using a TensorRT-optimized OWL-ViT model. Describe objects in natural language and detect them instantly on panoramic images. Optimized for NVIDIA GPUs with .engine acceleration.
A Multi-Agent GeoAI pipeline for Multimodal Disaster Perception, Restoration, Damage Recognition, and Reasoning
Model Mondays is a weekly livestreamed series on Microsoft Reactor that helps you make informed model choice decisions with timely updates and model deep-dives. Watch live for the content. Join Discord for the discussions.
Leverage VideoLLaMA 3's capabilities using LitServe.
Leverage Gemma 3's capabilities using LitServe.
Repository containing experiments relative to latent space temporal structure enforcement, performed as part of a project at EPFL University in collaboration with the IDIAP Lab
Add a description, image, and links to the multimodal-models topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-models topic, visit your repo's landing page and select "manage topics."