This repo serves as the official code release of the MemCNP-VIS model in the paper:
Some results can be seen below:
|
|
|
|
|
|
|
|
For more details about the dataset, please refer to our paper.
This repo is built based on MaskTrackRCNN, SipMask and a customized FCOS.
You can use following commands to create conda env with all dependencies.
Modify cuda and corresponding torch version.
conda env create -f env_conda.yml
conda activate memcnp
pip install -r env_pip.txt
git clone https://github.com/qjy981010/cocoapi.git /tmp/cocoapi
cd /tmp/cocoapi/PythonAPI
python setup.py install
cd -
conda install six matplotlib
Change the cuda path in compile.sh to your correct directory
bash compile.sh
- Download YIVS and OVIS.
- Symlink the train/validation dataset to
data/OVIS/folder. Put COCO-style json annotations underdata/annotations.
mmdetection
├── ...
├── mmdet
├── tools
├── configs
├── data
│ ├── OVIS
│ │ ├── train_images
│ │ ├── valid_images
│ │ ├── annotations
│ │ │ ├── annotations_train.json
│ │ │ ├── annotations_valid.json
│ │ │ ├── annotations_test.json
├── ├── YVIS
│ │ ├── train_images
│ │ ├── valid_images
│ │ ├── annotations
│ │ │ ├── instances_train.json
│ │ │ ├── instances_valid.json
│ │ │ ├── instances_test.json
Our default model is based on ViT-b_SipMask. The model is pretrained on ImageNet using MAE. model link.
Run the command below to train the model.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -u train.py configs/vitmae_cnp_contrast_ms_1x.py --work_dir
./workdir/vit-b_memcnp_yvis --gpus 4
To change training configurations such as learning rate, model parameters, and dataset, please refer to vitmae_cnp_contrast_ms_1x.py.
Our pretrained model is available for download at Google Drive. Run the following command to evaluate the model on YVIS.
CUDA_VISIBLE_DEVICES=0 python test_video.py configs/vitmae_cnp_contrast_ms_1x.py [MODEL_PATH] --out [OUTPUT_PATH.pkl] --eval segm
A json file containing the predicted result will be generated as OUTPUT_PATH.pkl.json. YVIS currently only allows evaluation on the codalab server. Please upload the generated result to codalab server to see actual performances.
For visualisation purpose, please use:
CUDA_VISIBLE_DEVICES=0 python test_video.py configs/vitmae_cnp_contrast_ms_1x.py [MODEL_PATH] --eval segm --show True --save_vis_path [VISUALISATION_PATH] --thresh 0.2
Note that when working in visualisation mode, the result file OUTPUT_PATH.pkl.json will not be properly generated even though you specify --out [OUTPUT_PATH.pkl].
This project is released under the Apache 2.0 license, while the correlation ops is under MIT license.
This project is based on MaskTrack-RCNN, SipMask, and FCOS. Thanks for their wonderful works.
If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :
@article{YUANMemCNP-VIS
title = {A memory-based conditional neural process for video instance segmentation},
journal = {Neurocomputing},
volume = {655},
pages = {131439},
year = {2025},
issn = {0925-2312},
doi = {https://doi.org/10.1016/j.neucom.2025.131439},
url = {https://www.sciencedirect.com/science/article/pii/S0925231225021113},
author = {Kunhao Yuan and Gerald Schaefer and Yu-Kun Lai and Xiyao Liu and Lin Guan and Hui Fang}
}







