A list of video object segmentation (VOS) papers.
Any suggestions and requests are always welcomed :)
-
[STMA] Spatial-Temporal Multi-level Association for Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
-
[OneVOS] OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework, ECCV [Paper] [arXiv] [Code]
-
[RMem] RMem: Restricted Memory Banks Improve Video Object Segmentation, CVPR [Paper] [arXiv] [Page]
-
[Point-VOS] Point-VOS: Pointing Up Video Object Segmentation, CVPR [Paper] [arXiv] [Page]
-
[Cutie] Putting the Object Back into Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[DeVOS] DeVOS: Flow-Guided Deformable Transformer for Video Object Segmentation, WACV [Paper]
-
[TTT] Test-time Training for Matching-based Video Object Segmentation, NeurIPS [Paper] [Code]
-
[READMem] READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation, BMVC [Paper] [arXiv] [Code]
-
[XMem++] XMem++: Production-level Video Segmentation From Few Annotated Frames, ICCV [Paper] [arXiv] [Code]
-
[SimVOS] Scalable Video Object Segmentation with Simplified Framework, ICCV [Paper] [arXiv] [Code]
-
[TMRN] Alignment Before Aggregation: Trajectory Memory Retrieval Network for Video Object Segmentation, ICCV [Paper]
-
[ISVOS] Look Before You Match: Instance Understanding Matters in Video Object Segmentation, CVPR [Paper] [arXiv]
-
[CorrLearn] Boosting Video Object Segmentation via Space-time Correspondence Learning, CVPR [Paper] [arXiv] [Code]
-
[MobileVOS] MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation, CVPR [Paper] [arXiv]
-
[TSVOS] Two-shot Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[LLB] Learning to Learn Better for Video Object Segmentation, AAAI [Paper] [arXiv] [Code]
-
[DeAOT] Decoupling Features in Hierarchical Propagation for Video Object Segmentation, NeurIPS [Paper] [arXiv] [Code]
-
[AOC] Towards Robust Video Object Segmentation with Adaptive Object Calibration, ACMMM [Paper] [arXiv] [Code]
-
[BATMAN] BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation, ECCV [Paper] [arXiv]
-
[XMem] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model, ECCV [Paper] [arXiv] [Code]
-
[QDMN] Learning Quality-aware Dynamic Memory for Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
-
[TBD] Tackling Background Distraction in Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
-
[GSFM] Global Spectral Filter Memory Network for Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
-
[RDE-VOS] Recurrent Dynamic Embedding for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[PCVOS] Per-Clip Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[CoVOS] Accelerating Video Object Segmentation with Compressed Video, CVPR [Paper] [arXiv] [Code]
-
[SWEM] SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization, CVPR [Paper] [arXiv] [Code]
-
[RPCMVOS] Reliable Propagation-Correction Modulation for Video Object Segmentation, AAAI [Paper] [arXiv] [Code]
-
[SITVOS] Siamese Network with Interactive Transformer for Video Object Segmentation, AAAI [Paper] [arXiv]
-
[BMVOS] Pixel-Level Bijective Matching for Video Object Segmentation, WACV [Paper] [arXiv] [Code]
-
[AOT] Associating Objects with Transformers for Video Object Segmentation, NeurIPS [Paper] [arXiv] [Code]
-
[STCN] Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation, NeurIPS [Paper] [arXiv] [Code]
-
[JOINT] Joint Inductive and Transductive Learning for Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
-
[HMMN] Hierarchical Memory Matching Network for Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
-
[DMN-AOA] Video Object Segmentation with Dynamic Memory Networks and Adaptive Object Alignment, ICCV [Paper] [Code]
-
[RMNet] Efficient Regional Memory Network for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[LCM] Learning Position and Target Consistency for Memory-Based Video Object Segmentation, CVPR [Paper] [arXiv]
-
[GIEL] Video Object Segmentation Using Global and Instance Embedding Learning, CVPR [Paper]
-
[SwiftNet] SwiftNet: Real-time Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[SSTVOS] SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[Reuse-VOS] Learning Dynamic Network Using a Reuse Gate Function in Semi-Supervised Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[STG-Net] Spatiotemporal Graph Neural Network Based Mask Reconstruction for Video Object Segmentation, AAAI [Paper] [arXiv]
-
[QMRA] Query-Memory Re-Aggregation for Weakly-Supervised Video Object Segmentation, AAAI [Paper]
-
[STM-cycle] Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation, NeurIPS [Paper] [arXiv] [Code]
-
[AFB-URR] Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement, NeurIPS [Paper] [arXiv] [Code]
-
[e-OSVOS] Make One-Shot Video Object Segmentation Efficient Again, NeurIPS [Paper] [arXiv] [Code]
-
[LWL] Learning What to Learn for Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
-
[EGMN] Video Object Segmentation with Episodic Graph Memory Networks, ECCV [Paper] [arXiv] [Code]
-
[CFBI] Collaborative Video Object Segmentation by Foreground-Background Integration, ECCV [Paper] [arXiv] [Code]
-
[GC] Fast Video Object Segmentation using the Global Context Module, ECCV [Paper] [arXiv]
-
[KMN] Kernelized Memory Network for Video Object Segmentation, ECCV [Paper] [arXiv]
-
[SAT] State-Aware Tracker for Real-Time Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[FRTM] Learning Fast and Robust Target Models for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[TVOS] A Transductive Approach for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[TAN-DTTM] Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching, CVPR [Paper] [arXiv]
-
[FTMU] Fast Template Matching and Update for Video Object Tracking and Segmentation, CVPR [Paper] [arXiv] [Code]
-
[DIPNet] DIPNet: Dynamic Identity Propagation Network for Video Object Segmentation, WACV [Paper]
-
[DMM-Net] DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
-
[AGSS-VOS] AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation, ICCV [Paper] [Code]
-
[RANet] RANet: Ranking Attention Network for Fast Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
-
[DTN] Fast Video Object Segmentation via Dynamic Targeting Network, ICCV [Paper]
-
[CapsuleVOS] CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing, ICCV [Paper] [arXiv] [Code]
-
[STM] Video Object Segmentation Using Space-Time Memory Networks, ICCV [Paper] [arXiv] [Code]
-
[MHP-VOS] MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[STCNN] Spatiotemporal CNN for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[RVOS] RVOS: End-To-End Recurrent Network for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[A-GAME] A Generative Appearance Model for End-To-End Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[FEELVOS] FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[SiamMask] Fast Online Object Tracking and Segmentation: A Unifying Approach, CVPR [Paper] [arXiv] [Code]
-
[TIS] Tukey-Inspired Video Object Segmentation, WACV [Paper] [arXiv] [Code]
-
[S2S] YouTube-VOS: Sequence-to-Sequence Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
-
[PReMVOS] PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation, ACCV [arXiv] [Code]
-
[OSMN] Efficient Video Object Segmentation via Network Modulation, CVPR [Paper] [arXiv] [Code]
-
[RGMP] Fast Video Object Segmentation by Reference-Guided Mask Propagation, CVPR [Paper] [Code]
-
[FAVOS] Fast and Accurate Online Video Object Segmentation via Tracking Parts, CVPR [Paper] [arXiv] [Code]
-
[SegFlow] SegFlow: Joint Learning for Video Object Segmentation and Optical Flow, ICCV [Paper] [arXiv] [Code]
-
[OSVOS] One-Shot Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[MaskTrack] Learning Video Object Segmentation from Static Images, CVPR [Paper] [arXiv] [Code]
-
[DPA] Dual Prototype Attention for Unsupervised Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[GSA-Net] Guided Slot Attention for Unsupervised Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[DATTT] Depth-aware Test-Time Training for Zero-shot Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[GFA] Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation, AAAI [Paper]
-
[SimulFlow] SimulFlow: Simultaneously Extracting Feature and Identifying Target for Unsupervised Video Object Segmentation, ACMMM [Paper] [arXiv]
-
[TGFormer] Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation, ACMMM [Paper]
-
[Isomer] Isomer: Isomerous Transformer for Zero-Shot Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
-
[OAST] Unsupervised Video Object Segmentation with Online Adversarial Self-Tuning, ICCV [Paper]
-
[PMN] Unsupervised Video Object Segmentation via Prototype Memory Network, WACV [Paper] [arXiv] [Code]
-
[TMO] Treating Motion as Option to Reduce Motion Dependency in Unsupervised Video Object Segmentation, WACV [Paper] [arXiv] [Code]
-
[HFAN] Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
-
[IMP] Iteratively Selecting an Easy Reference Frame Makes Unsupervised Video Object Segmentation Easier, AAAI [Paper] [arXiv]
-
[D2Conv3D] D2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos, WACV [Paper] [arXiv] [Code]
-
[CFAM] Video Salient Object Detection via Contrastive Features and Attention Modules, WACV [Paper] [arXiv]
-
[FSNet] Full-Duplex Strategy for Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
-
[TransportNet] Deep Transport Network for Unsupervised Video Object Segmentation, ICCV [Paper]
-
[AMC-Net] Learning Motion-Appearance Co-Attention for Zero-Shot Video Object Segmentation, ICCV [Paper] [Code]
-
[RTNet] Reciprocal Transformations for Unsupervised Video Object Segmentation, CVPR [Paper] [Code]
-
[F2Net] F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation, AAAI [Paper] [arXiv] [Code]
-
[FrameSelect] Mask Selection and Propagation for Unsupervised Video Object Segmentation, WACV [Paper] [Code]
-
[3DC-Seg] Making a Case for 3D Convolutions for Object Segmentation in Videos, BMVC [Paper] [arXiv] [Code]
-
[WCS-Net] Unsupervised Video Object Segmentation with Joint Hotspot Tracking, ECCV [Paper] [Code]
-
[DFNet] Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation, ECCV [Paper] [arXiv]
-
[MATNet] Motion-Attentive Transition for Zero-Shot Video Object Segmentation, AAAI [Paper] [arXiv] [Code]
-
[UnOVOST] UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking, WACV [Paper] [arXiv] [Code]
-
[EpO-Net] EpO-Net: Exploiting Geometric Constraints on Dense Trajectories for Motion Saliency, WACV [Paper] [arXiv] [Code]
-
[AD-Net] Anchor Diffusion for Unsupervised Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
-
[AGNN] Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks, ICCV [Paper] [arXiv] [Code]
-
[AGS] Learning Unsupervised Video Object Segmentation Through Visual Attention, CVPR [Paper] [Code]
-
[COSNet] See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks, CVPR [Paper] [arXiv] [Code]
-
[SSAV] Shifting More Attention to Video Salient Object Detection, CVPR [Paper] [Code]
-
[MOTAdapt] Video Object Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting, ICRA [Paper] [arXiv] [Code]
-
[VISA] VISA: Reasoning Video Object Segmentation via Large Language Models, ECCV [Paper] [arXiv] [Code]
-
[VD-IT] Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
-
[ActionVOS] ActionVOS: Actions as Prompts for Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
-
[LoSh] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[MUTR] Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation, AAAI [Paper] [arXiv] [Code]
-
[TCE-RVOS] Temporal Context Enhanced Referring Video Object Segmentation, WACV [Paper] [Code]
-
[SOC] SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation, NeurIPS [Paper] [arXiv] [Page]
-
[HTML] HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation, ICCV [Paper] [Page]
-
[OnlineRefer] OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
-
[CMA] Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples, ICCV [Paper] [arXiv] [Code]
-
[R2VOS] Robust Referring Video Object Segmentation with Cyclic Structural Consensus, ICCV [Paper] [arXiv] [Code]
-
[SgMg] Spectrum-guided Multi-granularity Referring Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
-
[TempCD] Temporal Collection and Distribution for Referring Video Object Segmentation, ICCV [Paper] [arXiv] [Code]
-
[MANet] Multi-Attention Network for Compressed Video Referring Object Segmentation, ACMMM [Paper] [arXiv] [Code]
-
[MTTR] End-to-End Referring Video Object Segmentation with Multimodal Transformers, CVPR [Paper] [arXiv] [Code]
-
[ReferFormer] Language as Queries for Referring Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[LBDT] Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[MLRL] Multi-Level Representation Learning with Semantic Alignment for Referring Video Object Segmentation, CVPR [Paper]
-
[YOFO] You Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object Segmentation, AAAI [Paper]
- [URVOS] URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark, ECCV [Paper] [Code]
-
[BA] Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
-
[LLE-VOS] Event-assisted Low-Light Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[EVA-VOS] Learning the What and How of Annotation in Video Object Segmentation, WACV [Paper] [arXiv] [Code]
-
[Training-Free-VOS] From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models, NeurIPS [Paper] [Code]
-
[DVSOD] DVSOD: RGB-D Video Salient Object Detection, NeurIPS [Paper] [arXiv] [Page]
-
[VOSPGD] Exploring the Adversarial Robustness of Video Object Segmentation via One-shot Adversarial Attacks, ACMMM [Paper]
-
[DEVA] Tracking Anything with Decoupled Video Segmentation, ICCV [Paper] [arXiv] [Code]
-
[Timetuning] Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations, ICCV [Paper] [arXiv] [Code]
-
[VOS-VFI] Video Object Segmentation-aware Video Frame Interpolation, ICCV [Paper] [Code]
-
[LVOS] LVOS: A Benchmark for Long-term Video Object Segmentation, ICCV [Paper] [arXiv] [Page]
-
[MOSE] MOSE: A New Dataset for Video Object Segmentation in Complex Scenes, ICCV [Paper] [arXiv] [Page]
-
[RCF] Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping, CVPR [Paper] [arXiv] [Code]
-
[VOST] Breaking the “Object” in Video Object Segmentation, CVPR [Paper] [arXiv] [Page]
-
[InstMove] InstMove: Instance Motion for Object-centric Video Segmentation, CVPR [Paper] [arXiv] [Code]
-
[SSL-VOS] A Simple and Powerful Global Optimization for Unsupervised Video Object Segmentation, WACV [Paper] [arXiv] [Code]
-
[BURST] BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video, WACV [Paper] [arXiv] [Code]
-
[EPIC-KITCHENS] EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations, NeurIPS [Paper] [arXiv] [Page]
-
[SaVos] Self-supervised Amodal Video Object Segmentation, NeurIPS [Paper] [arXiv]
-
[YouMVOS] YouMVOS: An Actor-centric Multi-shot Video Object Segmentation Dataset, CVPR [Paper] [Page]
-
[Wnet] Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks, CVPR [Paper] [Code]
-
[DUL] Dense Unsupervised Learning for Video Segmentation, NeurIPS [Paper] [arXiv] [Code]
-
[AMD] The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos, NeurIPS [Paper] [arXiv] [Code]
-
[MotionGroup] Self-supervised Video Object Segmentation by Motion Grouping, ICCV [Paper] [arXiv] [Code]
-
[GMB] Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos, ICCV [Paper] [arXiv] [Code]
-
[DANet] Delving Deep Into Many-to-Many Attention for Few-Shot Video Object Segmentation, CVPR [Paper] [Code]
-
[IVOS-W] Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild, CVPR [Paper] [arXiv] [Code]
-
[GIS] Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps, CVPR [Paper] [arXiv] [Code]
-
[MiVOS] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion, CVPR [Paper] [arXiv] [Code]
-
[ContrastCorr] Contrastive Transformation for Self-supervised Correspondence Learning, AAAI [Paper] [arXiv] [Code]
-
[TAO-VOS] Reducing the Annotation Effort for Video Object Segmentation Datasets, WACV [Paper] [arXiv] [Page]
-
[CRW] Space-Time Correspondence as a Contrastive Random Walk, NeurIPS [Paper] [arXiv] [Code]
-
[ODMS] Learning Object Depth from Camera Motion and Video Object Segmentation, ECCV [Paper] [arXiv] [Code]
-
[ScribbleBox] ScribbleBox: Interactive Annotation Framework for Video Object Segmentation, ECCV [Paper] [arXiv] [Page]
-
[ATNet] Interactive Video Object Segmentation Using Global and Local Transfer Modules, ECCV [Paper] [arXiv] [Code]
-
[MAST] MAST: A Memory-Augmented Self-Supervised Tracker, CVPR [Paper] [arXiv] [Code]
-
[MuG] Learning Video Object Segmentation From Unlabeled Videos, CVPR [Paper] [arXiv] [Code]
-
[MA-Net] Memory Aggregation Networks for Efficient Interactive Video Object Segmentation, CVPR [Paper] [arXiv] [Code]
-
[TimeCycle] Learning Correspondence from the Cycle-Consistency of Time, CVPR [Paper] [arXiv] [Code]
-
[BubbleNets] BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames, CVPR [Paper] [arXiv] [Code]
-
[IPNet] Fast User-Guided Video Object Segmentation by Interaction-And-Propagation Networks, CVPR [Paper] [arXiv] [Code]
- [YouTube-VOS] A Large-Scale Benchmark for Video Object Segmentation Dataset, preprint [arXiv] [Page]