This repository collects and categorizes top vision-language papers based on their approaches and applications, with a special focus on the CLIP model.
- Vision-Language Pre-training
- Prompt Learning for Vision-Language Models
- Feature Adapters for Vision-Language Models
- Regularization-Based Prompt Learning
- Test-Time Adaptation of Vision-Language Models
- CLIP-based Domain Generalization
- CLIP-based Object Detection
- CLIP-based Open-Vocabulary Segmentation
-
Learning Transferable Visual Models From Natural Language Supervision - CLIP (ICML 2021) [Paper][Code]
-
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision - ALIGN (ICML 2021) [Paper][Code]
-
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining (CVPR 2023) [paper][Code]
-
Scaling Language-Image Pre-training via Masking (CVPR 2023) [paper][Code]
-
Learning to Prompt for Vision-Language Models (IJCV 2022) [Paper][Code]
-
Conditional Prompt Learning for Vision-Language Models (CVPR 2022) [Paper][Code]
-
MaPLe: Multi-modal Prompt Learning (CVPR 2023) [Paper][Code]
-
Fine-tuned CLIP Models are Efficient Video Learners (CVPR 2023) [Paper][Code]
-
PLOT: Prompt Learning with Optimal Transport for Vision-Language Models (ICLR 2023) [Paper][Code]
-
Gradient-regulated meta-prompt learning for generalizable vision-language models (ICCV 2023) [Paper][Code]
-
Meta-Adapter: An Online Few-shot Learner for Vision-Language Model (NeurIPS 2023) [Paper][Code]
-
GalLoP: Learning Global and Local Prompts for Vision-Language Models (ECCV 2024) [Paper][Code]
-
IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning (EMNLP 2024) [Paper][Code]
-
Adversarial Prompt Tuning for Vision-Language Models (ECCV 2024) [Paper][Code]
-
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPR-W 2024) [Paper][Code]
-
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models (CVPR 2024) [Paper][Code]
-
TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model (CVPR 2024) [Paper][Code]
-
Quantized Prompt for Efficient Generalization of Vision-Language Models (ECCV 2024) [Paper][Code]
-
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models (ICML 2024) [Paper][Code]
-
Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models (ICLR 2024) [Paper][Code]
-
Prompt Learning with One-Shot Setting based Feature Space Analysis in Vision-and-Language Models (CVPR-W 2024) [Paper][Code]
-
Cascade Prompt Learning for Vision-Language Model Adaptation (ECCV 2024) [Paper][Code]
-
DECOOP: Robust Prompt Tuning with Out-of-Distribution Detection (ICML 2024) [Paper][Code]
-
AlignCLIP: Enhancing Stable Representations in Vision-Language Pretraining Models through Attention and Prediction Alignment (ICLR 2025) [Paper][Code]
-
CLIP-Adapter: Better Vision-Language Models with Feature Adapters (IJCV 2024) [Paper][Code]
-
MMA: Multi-Modal Adapter for Vision-Language Models (CVPR 2024) [Paper][Code]
-
Self-regulating Prompts: Foundational Model Adaptation without Forgetting (ICCV 2023) [Paper][Code]
-
Consistency-guided Prompt Learning for Vision-Language Models (ICLR 2024) [Paper][Code]
-
Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models (WACV 2025) [Paper][Code]
-
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models (NeurIPS 2022) [Paper][Code]
-
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization (NeurIPS 2023) [Paper][Code]
-
SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models (NeurIPS 2024) [paper] [code]
-
Efficient Test-Time Adaptation of Vision-Language Models (CVPR 2024) [paper] [code]
-
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models (ICML 2024) [paper] [code]
-
BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models (submitted to ICLR 2025) [paper] [code]
-
STYLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization (WACV 2024) [paper] [code]
-
AD-CLIP: Adapting Domains in Prompt Space Using CLIP (ICCV-W 2023) [paper] [code]
-
Any-Shift Prompting for Generalization over Distributions (CVPR 2024) [paper] [code]