The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
-
Updated
Sep 25, 2025
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
This is an official repository for "Harnessing Vision Models for Time Series Analysis: A Survey".
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
Project Page for Paper "Neural Brain: A Neuroscience-inspired Framework for Embodied Agents".
SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs).
The official implementation of "Diversity-Guided MLP Reduction for Efficient Large Vision Transformers"
These notes and resources are compiled from the crash course Prompt Engineering for Vision Models offered by DeepLearning.AI.
The official implementation of "Adaptive MLP Pruning for Large Vision Transformers"
Hiera is a hierarchical vision transformer architecture developed by Meta. Simplifies traditional vision transformers by removing complex modules. Learns spatial biases through Masked Autoencoder (MAE) pretraining.
Add a description, image, and links to the large-vision-models topic page so that developers can more easily learn about it.
To associate your repository with the large-vision-models topic, visit your repo's landing page and select "manage topics."