Skip to content

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

License

Notifications You must be signed in to change notification settings

jianghaojun/Awesome-Parameter-Efficient-Transfer-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 

Repository files navigation

Awesome-Parameter-Efficient-Transfer-Learning

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

Content

Why Parameter Efficient?

Pre-training, then fully fine-tuning is a long standing paradigm in deep learning. However, as pre-trained models are scaling up, e.g. GPT-3(175B params), fully fine-tuning them on various downstream tasks has a high risk of overfitting. Moreover, in practice, it would be costly to train and store a large model for each task. To overcome the above issues, researchers started to explore Parameter-Efficient Transfer Learning which aims at adapting large-scale pre-trained model to various downstream tasks by modifying as less parameter as possible.Inspired by the great advances in NLP domain and the continuous trend of scaling up models, scholars in computer vision and multimodal domains also join the research craze.

Keywords Convention

We follow the general idea of PromptPapers to label the papers.

The abbreviation of the work.

The main explored task of the work.

Other important information of the work.

Papers

Prompt

  • Learning to Prompt for Vision-Language Models, IJCV 2022 (arXiv:2109.01134).

    Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu. [Paper][Code]

  • Prompting Visual-Language Models for Efficient Video Understanding, ECCV 2022 (arXiv:2112.04478).

    Chen Ju, Tengda Han, Kunhao Zheng, Ya Zhang, Weidi Xie. [Paper][Code]

  • Domain Adaptation via Prompt Learning, arXiv: arXiv:2202.06687.

    Chunjiang Ge, Rui Huang, Mixue Xie, Zihang Lai, Shiji Song, Shuang Li, Gao Huang. [Paper][Code]

  • Conditional Prompt Learning for Vision-Language Models, CVPR 2022 (arXiv:2203.05557).

    Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu. [Paper][Code]

  • Visual Prompt Tuning, ECCV 2022 (arXiv:2203.12119).

    Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, Ser-Nam Lim. [Paper][Code]

  • Exploring Visual Prompts for Adapting Large-Scale Models, arXiv:2203.17274.

    Hyojin Bahng, Ali Jahanian, Swami Sankaranarayanan, Phillip Isola. [Paper][Code]

  • Pro-tuning: Unified Prompt Tuning for Vision Tasks, arXiv:2207.14381.

    Xing Nie, Bolin Ni, Jianlong Chang, Gaomeng Meng, Chunlei Huo, Zhaoxiang Zhang, Shiming Xiang, Qi Tian, Chunhong Pan. [Paper][Code]

  • P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting, arXiv:2208.02812.

    Ziyi Wang, Xumin Yu, Yongming Rao, Jie Zhou, Jiwen Lu. [Paper][Code]

  • Class-Aware Visual Prompt Tuning for Vision-Language Pre-Trained Model, arXiv:2208.08340.

    Yinghui Xing, Qirui Wu, De Cheng, Shizhou Zhang, Guoqiang Liang, Yanning Zhang. [Paper][Code]

  • Prompt Tuning with Soft Context Sharing for Vision-Language Models, arXiv:2208.13474.

    Kun Ding, Ying Wang, Pengzhang Liu, Qiang Yu, Haojian Zhang, Shiming Xiang, Chunhong Pan. [Paper][Code]

  • Language-Aware Soft Prompting for Vision & Language Foundation Models, arXiv:2210.01115.

    Adrian Bulat, Georgios Tzimiropoulos. [Paper][Code]

  • Prompt Learning with Optimal Transport for Vision-Language Models, arXiv:2210.01253.

    Guangyi Chen, Weiran Yao, Xiangchen Song, Xinyue Li, Yongming Rao, Kun Zhang. [Paper][Code]

  • MaPLe: Multi-modal Prompt Learning, arXiv:2210.03117.

    Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan. [Paper][Code]

  • Unified Vision and Language Prompt Learning, arXiv:2210.07225.

    Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy. [Paper][Code]

  • CPL: Counterfactual Prompt Learning for Vision and Language Models, arXiv:2210.10362.

    Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang. [Paper][Code]

  • Understanding and Improving Visual Prompting: A Label-Mapping Perspective, arXiv:2211.11635.

    Aochuan Chen, Yuguang Yao, Pin-Yu Chen, Yihua Zhang, Sijia Liu. [Paper][Code]

  • Texts as Images in Prompt Tuning for Multi-Label Image Recognition, arXiv:2211.12739.

    Zixian Guo, Bowen Dong, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng Zuo. [Paper][Code]

  • VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval, arXiv:2211.12764.

    Siteng Huang, Biao Gong, Yulin Pan, Jianwen Jiang, Yiliang Lv, Yuyuan Li, Donglin Wang. [Paper][Code]

  • Unleashing the Power of Visual Prompting At the Pixel Level, arXiv:2212.10556.

    Junyang Wu, Xianhang Li, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie. [Paper][Code]

  • Self-Supervised Convolutional Visual Prompts, arXiv:2303.00198.

    Yun-Yun Tsai, Chengzhi Mao, Yow-Kuan Lin, Junfeng Yang. [Paper][Code]

  • Multimodal Prompting with Missing Modalities for Visual Recognition, CVPR 2023 (arXiv:2303.03369).

    Yi-Lun Lee, Yi-Hsuan Tsai, Wei-Chen Chiu, Chen-Yu Lee. [Paper][Code]

  • From Visual Prompt Learning to Zero-Shot Transfer: Mapping Is All You Need, arXiv:2303.05266.

    Ziqing Yang, Zeyang Sha, Michael Backes, Yang Zhang. [Paper][Code]

  • Diversity-Aware Meta Visual Prompting, CVPR 2023 (arXiv:2303.08138).

    Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, Nenghai Yu. [Paper][Code]

  • Patch-Token Aligned Bayesian Prompt Learning for Vision-Language Models, arXiv:2303.09100.

    Xinyang Liu, Dongsheng Wang, Miaoge Li, Zhibin Duan, Yishi Xu, Bo Chen, Mingyuan Zhou. [Paper][Code]

  • LION: Implicit Vision Prompt Tuning, arXiv:2303.09992.

    Haixin Wang, Jianlong Chang, Xiao Luo, Jinan Sun, Zhouchen Lin, Qi Tian. [Paper][Code]

  • Fine-Grained Regional Prompt Tuning for Visual Abductive Reasoning, arXiv:2303.10428.

    Hao Zhang, Basura Fernando. [Paper][Code]

  • Visual Prompt Multi-Modal Tracking, CVPR 2023 (arXiv:2303.10826).

    Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, Huchuan Lu. [Paper][Code]

  • Explicit Visual Prompting for Low-Level Structure Segmentations, CVPR 2023 (arXiv:2303.10883).

    Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun. [Paper][Code]

  • CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition, arXiv:2303.11313.

    Deepti Hegde, Jeya Maria Jose Valanarasu, Vishal M. Patel. [Paper][Code]

    Comments: This works' idea is similar to our Text4Point.

  • Multi-modal Prompting for Low-Shot Temporal Action Localization, arXiv:2303.11732.

    Chen Ju, Zeqian Li, Peisen Zhao, Ya Zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie. [Paper][Code]

    Highlight: Enrich the meaning of an action class by querying the large-scale language model to give a detailed action description.

  • Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning, arXiv:2303.15230.

    Siteng Huang, Biao Gong, Yutong Feng, Yiliang Lv, Donglin Wang. [Paper][Code]

  • LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention, arXiv:2303.16199.

    Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, Yu Qiao. [Paper][Code]

    Highlight: Tuning the LLaMA(7B Params) to an excellent ChatBot with only 1.2M trainable parameters and 1 hour fine-tuning.

  • Probabilistic Prompt Learning for Dense Prediction, arXiv:2304.00779.

    Hyeongjun Kwon, Taeyong Song, Somi Jeong, Jin Kim, Jinhyun Jang, Kwanghoon Sohn. [Paper]

Adapter

  • VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks, CVPR 2022 (arXiv:2112.06825).

    Yi-Lin Sung, Jaemin Cho, Mohit Bansal. [Paper][Code]

  • AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition, NeurIPS 2022 (arXiv:2205.13535).

    Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, Ping Luo. [Paper][Code]

  • Zero-Shot Video Question Answering via Frozen Bidirectional Language Models, NeurIPS 2022 (arXiv:2206.08155).

    Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid. [Paper][Code]

  • ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning, NeurIPS 2022 (arXiv:2206.13559).

    Junting Pan, Ziyi Lin, Xiatian Zhu, Jing Shao, Hongsheng Li. [Paper][Code]

  • Convolutional Bypasses Are Better Vision Transformer Adapters, arXiv:2207.07039.

    Shibo Jie, Zhi-Hong Deng. [Paper][Code]

  • Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets, arXiv:2208.07463.

    Hao Chen, Ran Tao, Han Zhang, Yidong Wang, Wei Ye, Jindong Wang, Guosheng Hu, Marios Savvides. [Paper][Code]

  • Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving, NeurIPS 2022 (arXiv:2209.08953).

    Xiwen Liang, Yangxin Wu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan Liang. [Paper][Code]

  • Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks, NeurIPS 2022 (arXiv:2210.03265).

    Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, Zsolt Kira. [Paper][Code]

  • SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models, arXiv:2210.03794.

    Omiros Pantazis, Gabriel Brostow, Kate Jones, Oisin Mac Aodha. [Paper][Code]

  • Cross-Modal Adapter for Text-Video Retrieval, arXiv:2211.09623.

    Haojun Jiang, Jianke Zhang, Rui Huang, Chunjiang Ge, Zanlin Ni, Jiwen Lu, Jie Zhou, Shiji Song, Gao Huang. [Paper][Code]

  • Vision Transformers are Parameter-Efficient Audio-Visual Learners, arXiv:2212.07983.

    Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius. [Paper][Code]

    Take away message: Pre-trained vision transformer can deal with audio data by representing 1D raw audio signal as 2D audio image.

  • Multimodal Video Adapter for Parameter Efficient Video Text Retrieval, arXiv:2301.07868.

    Bowen Zhang, Xiaojie Jin, Weibo Gong, Kai Xu, Zhao Zhang, Peng Wang, Xiaohui Shen, Jiashi Feng. [Paper][Code]

  • AIM: Adapting Image Models for Efficient Video Action Recognition, ICLR 2023 (arXiv:2302.03024).

    Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li. [Paper][Code]

  • Offsite-Tuning: Transfer Learning without Full Model, arXiv:2302.04870.

    Guangxuan Xiao, Ji Lin, Song Han. [Paper][Code]

  • UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling, arXiv:2302.06605.

    Haoyu Lu, Mingyu Ding, Yuqi Huo, Guoxing Yang, Zhiwu Lu, Masayoshi Tomizuka, Wei Zhan. [Paper][Code]

  • Towards Efficient Visual Adaption via Structural Re-parameterization, arXiv:2302.08106.

    Gen Luo, Minglang Huang, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji. [Paper][Code]

  • T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models, arXiv:2302.08453.

    Chong Mou, Xintao Wang, Liangbin Xie, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie. [Paper][Code]

  • kNN-Adapter: Efficient Domain Adaptation for Black-Box Language Models, arXiv:2302.10879.

    Yangsibo Huang, Daogao Liu, Zexuan Zhong, Weijia Shi, Yin Tat Lee. [Paper][Code]

  • Side Adapter Network for Open-Vocabulary Semantic Segmentation, arXiv:2302.12242.

    Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai. [Paper][Code]

  • Dual-path Adaptation from Image to Video Transformers, arXiv:2303.09857.

    Jungin Park, Jiyoung Lee, Kwanghoon Sohn. [Paper][Code]

    Highlight: Modeling temporal information in a seperate path.

  • Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning, ICLR 2023 (arXiv:2303.11866).

    Zaid Khan, Yun Fu. [Paper][Code]

    Highlight: Aligning an already-trained vision and language model with adapter.

  • A Closer Look at Parameter-Efficient Tuning in Diffusion Models, arXiv:2303.18181.

    Chendong Xiang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu. [Paper]

  • SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters, EMNLP 2022 (arXiv:2210.04284).

    Shwai He, Liang Ding, Daize Dong, Miao Zhang, Dacheng Tao. [Paper][Code]

Unified

  • Towards a Unified View of Parameter-Efficient Transfer Learning, ICLR 2022 (arXiv:2110.04366).

    Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig. [Paper][Code]

  • Neural Prompt Search, arXiv:2206.04673.

    Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu. [Paper][Code]

  • Rethinking Efficient Tuning Methods from a Unified Perspective, arXiv:2303.00690.

    Zeyinzi Jiang, Chaojie Mao, Ziyuan Huang, Yiliang Lv, Deli Zhao, Jingren Zhou. [Paper][Code]

Others

  • Check out thunlp/DeltaPapers if you are interested in the progress of NLP domain.

  • LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning, NeurIPS 2022 (arXiv:2206.06522).

    Yi-Lin Sung, Jaemin Cho, Mohit Bansal. [Paper][Code]

  • Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning, NeurIPS 2022 (arXiv:2210.08823).

    Dongze Lian, Daquan Zhou, Jiashi Feng, Xinchao Wang. [Paper][Code]

  • FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer, AAAI 2023 (arXiv:2212.03145).

    Shibo Jie, Zhi-Hong Deng. [Paper][Code]

  • Important Channel Tuning, Openreview.

    Hengyuan Zhao, Pichao WANG, Yuyang Zhao, Fan Wang, Mike Zheng Shou. [Paper][Code]

  • MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering, arXiv:2303.01239.

    Jingjing Jiang, Nanning Zheng. [Paper][Code]

    MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

Contribution

Contributors

Contributing to this paper list

  • Here is the tutorial of contributing to others projects.
  • First, think about which category the work should belong to.
  • Second, use the same format as the others to describe the work. Note that there should be an empty line between the title and the author's list, and take care of the indentation.
  • Then, add keywords tags. Add the pdf link of the paper. If it is an arxiv publication, we prefer /abs/ format to /pdf/ format.

Acknowledgement

The structure of this repository is following thunlp/DeltaPapers which focuses on collecting awesome parameter-efficient transfer learning papers in nature language processing domain. Check out their repository if you are interested in the progress of NLP domain.

About

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published