Skip to content

This repository collects and categorizes top vision-language papers based on their approaches and applications, with a special focus on the CLIP model.

Notifications You must be signed in to change notification settings

NilouAp/Top-Vision-Language-Papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 

Repository files navigation

Top Vision Language Papers

This repository collects and categorizes top vision-language papers based on their approaches and applications, with a special focus on the CLIP model.

Contents

Vision-Language Pre-training

  • Learning Transferable Visual Models From Natural Language Supervision - CLIP (ICML 2021) [Paper][Code]

  • Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision - ALIGN (ICML 2021) [Paper][Code]

  • MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining (CVPR 2023) [paper][Code]

  • Scaling Language-Image Pre-training via Masking (CVPR 2023) [paper][Code]

Prompt Learning for Vision-Language Models

  • Learning to Prompt for Vision-Language Models (IJCV 2022) [Paper][Code]

  • Conditional Prompt Learning for Vision-Language Models (CVPR 2022) [Paper][Code]

  • MaPLe: Multi-modal Prompt Learning (CVPR 2023) [Paper][Code]

  • Fine-tuned CLIP Models are Efficient Video Learners (CVPR 2023) [Paper][Code]

  • PLOT: Prompt Learning with Optimal Transport for Vision-Language Models (ICLR 2023) [Paper][Code]

  • Gradient-regulated meta-prompt learning for generalizable vision-language models (ICCV 2023) [Paper][Code]

  • Meta-Adapter: An Online Few-shot Learner for Vision-Language Model (NeurIPS 2023) [Paper][Code]

  • GalLoP: Learning Global and Local Prompts for Vision-Language Models (ECCV 2024) [Paper][Code]

  • IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning (EMNLP 2024) [Paper][Code]

  • Adversarial Prompt Tuning for Vision-Language Models (ECCV 2024) [Paper][Code]

  • AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPR-W 2024) [Paper][Code]

  • PromptKD: Unsupervised Prompt Distillation for Vision-Language Models (CVPR 2024) [Paper][Code]

  • TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model (CVPR 2024) [Paper][Code]

  • DePT: Decoupled Prompt Tuning (CVPR 2024) [Paper][Code]

  • Quantized Prompt for Efficient Generalization of Vision-Language Models (ECCV 2024) [Paper][Code]

  • Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models (ICML 2024) [Paper][Code]

  • Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models (ICLR 2024) [Paper][Code]

  • Prompt Learning with One-Shot Setting based Feature Space Analysis in Vision-and-Language Models (CVPR-W 2024) [Paper][Code]

  • Cascade Prompt Learning for Vision-Language Model Adaptation (ECCV 2024) [Paper][Code]

  • DECOOP: Robust Prompt Tuning with Out-of-Distribution Detection (ICML 2024) [Paper][Code]

  • AlignCLIP: Enhancing Stable Representations in Vision-Language Pretraining Models through Attention and Prediction Alignment (ICLR 2025) [Paper][Code]

Feature Adapters for Vision-Language Models

  • CLIP-Adapter: Better Vision-Language Models with Feature Adapters (IJCV 2024) [Paper][Code]

  • MMA: Multi-Modal Adapter for Vision-Language Models (CVPR 2024) [Paper][Code]

Regularization-Based Prompt Learning

  • Self-regulating Prompts: Foundational Model Adaptation without Forgetting (ICCV 2023) [Paper][Code]

  • Consistency-guided Prompt Learning for Vision-Language Models (ICLR 2024) [Paper][Code]

  • Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models (WACV 2025) [Paper][Code]

Test-Time Adaptation of Vision-Language Models

  • Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models (NeurIPS 2022) [Paper][Code]

  • Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization (NeurIPS 2023) [Paper][Code]

  • SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models (NeurIPS 2024) [paper] [code]

  • Efficient Test-Time Adaptation of Vision-Language Models (CVPR 2024) [paper] [code]

  • Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models (ICML 2024) [paper] [code]

  • BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models (submitted to ICLR 2025) [paper] [code]

CLIP-based Domain Generalization

  • STYLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization (WACV 2024) [paper] [code]

  • AD-CLIP: Adapting Domains in Prompt Space Using CLIP (ICCV-W 2023) [paper] [code]

  • Any-Shift Prompting for Generalization over Distributions (CVPR 2024) [paper] [code]

CLIP-based Object Detection

  • Revisiting Few-Shot Object Detection with Vision-Language Models (NeurIPS 2024) [paper] [code]

  • CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching (CVPR 2023) [paper] [code]

CLIP-based Open-Vocabulary Segmentation

  • ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation (ECCV 2024) [paper] [code]

  • Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation (Under review) [paper] [code]

About

This repository collects and categorizes top vision-language papers based on their approaches and applications, with a special focus on the CLIP model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published