MemCNP-VIS for YouTube-VIS and OVIS

This repo serves as the official code release of the MemCNP-VIS model in the paper:

A Memory-based Conditional Neural Process for Video Instance Segmentation

Some results can be seen below:

For more details about the dataset, please refer to our paper.

Model training and evaluation

Installation

This repo is built based on MaskTrackRCNN, SipMask and a customized FCOS.

You can use following commands to create conda env with all dependencies.

Modify cuda and corresponding torch version.

conda env create -f env_conda.yml
conda activate memcnp

pip install -r env_pip.txt
git clone https://github.com/qjy981010/cocoapi.git /tmp/cocoapi 
cd /tmp/cocoapi/PythonAPI 
python setup.py install 
cd -
conda install six matplotlib

Change the cuda path in compile.sh to your correct directory

bash compile.sh

Data preparation

Download YIVS and OVIS.
Symlink the train/validation dataset to data/OVIS/ folder. Put COCO-style json annotations under data/annotations.

mmdetection
├── ...
├── mmdet
├── tools
├── configs
├── data
│   ├── OVIS
│   │   ├── train_images
│   │   ├── valid_images
│   │   ├── annotations
│   │   │   ├── annotations_train.json
│   │   │   ├── annotations_valid.json
│   │   │   ├── annotations_test.json
├── ├── YVIS
│   │   ├── train_images
│   │   ├── valid_images
│   │   ├── annotations
│   │   │   ├── instances_train.json
│   │   │   ├── instances_valid.json
│   │   │   ├── instances_test.json

Training

Our default model is based on ViT-b_SipMask. The model is pretrained on ImageNet using MAE. model link.

Run the command below to train the model.

CUDA_VISIBLE_DEVICES=0,1,2,3 python -u train.py configs/vitmae_cnp_contrast_ms_1x.py --work_dir 
./workdir/vit-b_memcnp_yvis --gpus 4

To change training configurations such as learning rate, model parameters, and dataset, please refer to vitmae_cnp_contrast_ms_1x.py.

Evaluation

Our pretrained model is available for download at Google Drive. Run the following command to evaluate the model on YVIS.

CUDA_VISIBLE_DEVICES=0 python test_video.py configs/vitmae_cnp_contrast_ms_1x.py [MODEL_PATH] --out [OUTPUT_PATH.pkl] --eval segm

A json file containing the predicted result will be generated as OUTPUT_PATH.pkl.json. YVIS currently only allows evaluation on the codalab server. Please upload the generated result to codalab server to see actual performances.

For visualisation purpose, please use:

CUDA_VISIBLE_DEVICES=0 python test_video.py configs/vitmae_cnp_contrast_ms_1x.py [MODEL_PATH] --eval segm --show True --save_vis_path [VISUALISATION_PATH] --thresh 0.2

Note that when working in visualisation mode, the result file OUTPUT_PATH.pkl.json will not be properly generated even though you specify --out [OUTPUT_PATH.pkl].

License

This project is released under the Apache 2.0 license, while the correlation ops is under MIT license.

Acknowledgement

This project is based on MaskTrack-RCNN, SipMask, and FCOS. Thanks for their wonderful works.

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :

@article{YUANMemCNP-VIS
title = {A memory-based conditional neural process for video instance segmentation},
journal = {Neurocomputing},
volume = {655},
pages = {131439},
year = {2025},
issn = {0925-2312},
doi = {https://doi.org/10.1016/j.neucom.2025.131439},
url = {https://www.sciencedirect.com/science/article/pii/S0925231225021113},
author = {Kunhao Yuan and Gerald Schaefer and Yu-Kun Lai and Xiyao Liu and Lin Guan and Hui Fang}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
mmcv		mmcv
mmdet		mmdet
tools		tools
visualisations		visualisations
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compile.sh		compile.sh
convert_coco2yvis.py		convert_coco2yvis.py
env_conda.yml		env_conda.yml
env_pip.txt		env_pip.txt
fetch_eval.sh		fetch_eval.sh
fetch_eval_ovis.sh		fetch_eval_ovis.sh
setup.py		setup.py
test_video.py		test_video.py
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemCNP-VIS for YouTube-VIS and OVIS

A Memory-based Conditional Neural Process for Video Instance Segmentation

Model training and evaluation

Installation

Data preparation

Training

Evaluation

License

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

SCoulY/MemCNP-VIS

Folders and files

Latest commit

History

Repository files navigation

MemCNP-VIS for YouTube-VIS and OVIS

A Memory-based Conditional Neural Process for Video Instance Segmentation

Model training and evaluation

Installation

Data preparation

Training

Evaluation

License

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages