This is the official PyTorch implementation of our paper:
Pixel-Level Bijective Matching for Video Object Segmentation, WACV 2022
Suhwan Cho, Heansung Lee, Minjung Kim, Sungjun Jang, Sangyoun Lee
Link: [WACV] [arXiv]

You can also explore other related works at awesome-video-object segmentation.
In conventional semi-supervised VOS methods, the query frame pixels select the best-matching pixels in the reference frame and transfer the information from those pixels without any consideration of reference frame options. As there is no limitation to the number of reference frame pixels being referenced, background distractors in the query frame will get high foreground scores and can disrupt the prediction. To mitigate this issue, we introduce a bijective matching mechanism to find the best matches from the query frame to the reference frame and also vice versa. In addition, to take advantage of the property of a video that an object usually occupies similar positions in consecutive frames, we propose a mask embedding module.
1. Download the datasets: DAVIS, YouTube-VOS.
2. Download our custom split for the YouTube-VOS training set.
Please follow the instructions in TBD.
Run BMVOS with:
python run.py
Verify the following before running:
✅ Testing dataset selection
✅ GPU availability and configuration
✅ Pre-trained model path
Pre-trained model (DAVIS)
Pre-trained model (YouTube-VOS)
Pre-computed results
Code and models are only available for non-commercial research purposes.
For questions or inquiries, feel free to contact:
E-mail: suhwanx@gmail.com