Skip to content

jinbae-s/ACVIS

Repository files navigation

Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation

Jinbae Seo, Hyeongjun Kwon, Kwonyoung Kim, Jiyoung Lee and Kwanghoon Sohn

The official pytorch implementation of ACVIS

radar.

Demo

y2zr2xeTEx4-1694874243365_segment.mp4
87Rxhx_VqBQ-1694963122882-1695388384196.mp4_segment.mp4
00000768_segment.mp4
E0zx8GIBI_Y_segment.mp4

Installation

conda create --name acvis python=3.8 -y
conda activate acvis

conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia -y
pip install -U opencv-python
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
pip install -r requirements.txt
pip install timm

cd mask2former/modeling/pixel_decoder/ops
sh make.sh

Setup

Datasets

Download and unzip AVISeg datasets and put them in ./datasets.

Pretrained Backbones

Download and unzip pre-trained backbones OneDrive and put them in ./pre_models.

Checkpoints

Download the following checkpoints and put them in ./checkpoints.

Backbone Pre-trained Datasets mAP HOTA FSLA Model Weight
ResNet-50 ImageNet 42.01 62.04 42.43 ACVIS_R50_IN.pth
ResNet-50 ImageNet & COCO 46.64 65.02 46.72 ACVIS_R50_COCO.pth

Getting Started

Train

python train_net.py --num-gpus 2 --config-file configs/acvis/acvis_saoc.yaml

Evaluation

python train_net.py --config-file configs/acvis/acvis_saoc.yaml --eval-only MODEL.WEIGHTS checkpoints/ACVIS_R50_COCO.pth

Demo

python demo_video/demo.py --config-file configs/acvis/acvis_saoc.yaml --opts MODEL.WEIGHTS checkpoints/ACVIS_R50_COCO.pth

Citation

@misc{seo2025acvis,
      title={Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation}, 
      author={Jinbae Seo and Hyeongjun Kwon and Kwonyoung Kim and Jiyoung Lee and Kwanghoon Sohn},
      year={2025},
      eprint={2509.22740},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2509.22740}, 
}

Acknowledgement

Our implementation is based on Detectron2, Mask2Former, VITA and AVIS. Thanks for their great works.

Releases

No releases published

Packages

No packages published

Languages