Name	Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md	README.md

Awesome Video-Text Retrieval by Deep Learning

A curated list of deep learning resources for video-text retrieval.

Contributing

Please feel free to pull requests to add papers.

Markdown format:

- `[Conference/Trans Year]` Author. Title. Trans Year. [[paper]](link) [[code]](link) [[homepage]](link)

Popular Implementations
Papers
- 2019 - 2018 - Before
- Ad-hoc Video Search
- Other Related
Datasets

Popular Implementations

PyTorch

TensorFlow

jsfusion

Others

w2vv(Keras)

Papers

2019

[CVPR2019] Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, Xun Wang. Dual Encoding for Zero-Example Video Retrieval. CVPR, 2019. [paper] [code]
[CVPR2019] Yale Song, and Mohammad Soleymani. Polysemous visual-semantic embedding for cross-modal retrieval. CVPR, 2019. [paper]
[ICCV2019] Michael Wray, Diane Larlus, Gabriela Csurka, and Dima Damen. Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings. ICCV, 2019. [paper]
[ICCV2019] Yu Xiong, Qingqiu Huang, Lingfeng Guo, Hang Zhou, Bolei Zhou, and Dahua Lin. A Graph-Based Framework to Bridge Movies and Synopses. ICCV, 2019. [paper]
[ACMMM2019] Xirong Li, Chaoxi Xu, Gang Yang, Zhineng Chen, and Jianfeng Dong. W2VV++ Fully Deep Learning for Ad-hoc Video Search. ACM Multimedia, 2019. [paper] [code]
[BMVC2019] Yang Liu, Samuel Albanie, Arsha Nagrani, Andrew Zisserman. Use What You Have: Video Retrieval Using Representations From Collaborative Experts. MBVC, 2019. [paper] [code]
[BigMM2019] Jaeyoung Choi, Martha Larson, Gerald Friedland, and Alan Hanjalic. From Intra-Modal to Inter-Modal Space: Multi-Task Learning of Shared Representations for Cross-Modal Retrieval. International Conference on Multimedia Big Data, 2019. [paper]

2018

[TMM2018] Jianfeng Dong, Xirong Li, Cees GM Snoek. Predicting visual features from text for image and video caption retrieval. IEEE Transactions on Multimedia, 2018. [paper] [code]
[ECCV2018] Bowen Zhang, Hexiang Hu, Fei Sha. Cross-Modal and Hierarchical Modeling of Video and Text. ECCV, 2018. [paper] [code]
[ECCV2018] Youngjae Yu, Jongseok Kim, Gunhee Kim. A Joint Sequence Fusion Model for Video Question Answering and Retrieval. ECCV, 2018. [paper]
[ECCV2018] Dian Shao, Yu Xiong, Yue Zhao, Qingqiu Huang, Yu Qiao, and Dahua Lin. Find and focus: Retrieve and localize video events with natural language queries. ECCV, 2018. [paper]
[ICMR2018] Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, Amit K. Roy-Chowdhury. Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval. ICMR, 2018. [paper] [code]
[arXiv2018] Antoine Miech, Ivan Laptev, Josef Sivic. Learning a Text-Video Embedding from Incomplete and Heterogeneous Data. arXiv preprint arXiv:1804.02516, 2018. [paper] [code]

Before

[CVPR2017] Youngjae Yu, Hyungjin Ko, Jongwook Choi, Gunhee Kim. End-to-end concept word detection for video captioning, retrieval, and question answering. CVPR, 2017. [paper] [code]
[ECCVW2016] Mayu OtaniEmail, Yuta NakashimaEsa, RahtuJanne Heikkilä, Naokazu Yokoya. Learning joint representations of videos and sentences with web image search. ECCV Workshop, 2016. [paper]
[AAAI2015] Ran Xu, Caiming Xiong, Wei Chen, Jason J Corso. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. AAAI, 2015. [paper]

Ad-hoc Video Search

For the papers targeting at ad-hoc video search in the context of [TRECVID], please refer to [here]

Other Related

[arXiv2020] Tianhao Li, and Limin Wang. Learning Spatiotemporal Features via Video and Text Pair Discrimination. arXiv preprint arXiv:2001.05691, 2020. [paper]
[arXiv2019] Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas, and Dima Damen. Action Modifiers: Learning from Adverbs in Instructional Videos. arXiv preprint arXiv:1912.06617, 2019. [paper]
[arXiv2019] Antoine Miech, Jean-Baptiste Alayrac, Lucas Smaira, Ivan Laptev, Josef Sivic, and Andrew Zisserman. End-to-End Learning of Visual Representations from Uncurated Instructional Videos. arXiv preprint arXiv:1912.06430, 2019. [paper]

Datasets

[MSVD] David L. Chen and William B. Dolan. Collecting Highly Parallel Data for Paraphrase Evaluation. ACL, 2011. [paper] [dataset]
[MSRVTT] Jun Xu Tao Mei Ting Yao Yong Rui. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language. CVPR, 2016. [paper] [dataset]
[TGIF] Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, and Jiebo Luo. TGIF: A new dataset and benchmark on animated GIF description. CVPR, 2016. [paper] [homepage]
[AVS] George Awad, et al. Trecvid 2016: Evaluating video search, video event detection, localization, and hyperlinking. TRECVID Workshop, 2016. [paper] [dataset]
[LSMDC] Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, Niket Tandon, Christopher Pal, Hugo Larochelle, Aaron Courville, and Bernt Schiele. Movie description. IJCV, 2017. [paper] [dataset]
[ActivityNet Captions] Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, and Juan Carlos Niebles. Dense-captioning events in videos. ICCV, 2017. [paper] [dataset]
[DiDeMo] Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell. Localizing Moments in Video with Natural Language. ICCV, 2017. [paper] [code]
[HowTo100M] Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips. ICCV, 2019. [homepage] paper
[VATEX] Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang. VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research. ICCV, 2019. [paper] [homepage]

Licenses

To the extent possible under law, danieljf24 all copyright and related or neighboring rights to this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Video-Text Retrieval by Deep Learning

Contributing

Table of Contents

Popular Implementations

PyTorch

TensorFlow

Others

Papers

2019

2018

Before

Ad-hoc Video Search

Other Related

Datasets

Licenses

About

Releases

Packages

dxli94/awesome-video-text-retrieval

Folders and files

Latest commit

History

Repository files navigation

Awesome Video-Text Retrieval by Deep Learning

Contributing

Table of Contents

Popular Implementations

PyTorch

TensorFlow

Others

Papers

2019

2018

Before

Ad-hoc Video Search

Other Related

Datasets

Licenses

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages