Qiang Hu1, *, Zhenyu Yi2, *, Ying Zhou1, Fan Huang3, Mei Liu4, Qiang Li1, Zhiwei Wang1, †
1 WNLO, HUST, 2 SES, HUST, 3 UIH, 4 HUST Tongji Medical College
(*: equal contribution, †: corresponding author)
Colonoscopy videos provide richer information in polyp segmentation for rectal cancer diagnosis.However, the endoscope's fast moving and close-up observing make the current methods suffer from large spatial incoherence and continuous low-quality frames, and thus yield limited segmentation accuracy. In this context, we focus on robust video polyp segmentation by enhancing the adjacent feature consistency and rebuilding the reliable polyp representation. To achieve this goal, we in this paper propose SALI network, a hybrid of Short-term Alignment Module (SAM) and Long-term Interaction Module (LIM).The SAM learns spatial-aligned features of adjacent frames via deformable convolution and further harmonizes them to capture more stable short-term polyp representation. In case of low-quality frames, the LIM stores the historical polyp representations as a long-term memory bank, and explores the retrospective relations to interactively rebuild more reliable polyp features for the current segmentation. Combing SAM and LIM, the SALI network of video segmentation shows a great robustness to the spatial variations and low-visual cues.
SALI showcases formidable Learning Ability (92.7/89.1
max Dice score on SUN-SEG-Seen-Easy/-Hard) and Generalization Capabilities (82.5/82.2
max Dice score on SUN-SEG-Unseen-Easy/-Hard) in the VPS task, surpassing previous models by a large margin.
- The figure below illustrates some of the
Consecutive Low-quality Sequences
in the specific sub-test set.
- Python 3.8+
- PyTorch 1.9+
- TorchVision corresponding to the PyTorch version
- NVIDIA GPU + CUDA
# Install other dependent packages
pip install -r requirements.txt
# Install cuda extensions for FA
cd lib/ops_align
python setup.py build develop
cd ../..
Please refer to PNS+ to get access to the SUN-SEG dataset, and download it to path ./datasets
. The path structure should be as follows:
SALI
├── datasets
│ ├── SUN-SEG
│ │ ├── TestEasyDataset
│ │ │ ├── Seen
│ │ │ ├── Unseen
│ │ ├── TestHardDataset
│ │ │ ├── Seen
│ │ │ ├── Unseen
│ │ ├── TrainDataset
The pre-trained weights is available here.
mkdir pretrained
cd pretrained
# download the weights with the links above.
python train_video.py
python test_video.py
You can download our checkpoint and put it in directory ./snapshot
for a quick test.
For fair comparison, we evaluate all methods through the toolbox ./eval
provided by PNS+.
The predition maps of SALI can be downloaded via this link.
If you find our paper and code useful in your research, please consider giving us a star ⭐ and citing SALI by the following BibTeX entry.
@inproceedings{hu2024sali,
title={SALI: Short-Term Alignment and Long-Term Interaction Network for Colonoscopy Video Polyp Segmentation},
author={Hu, Qiang and Yi, Zhenyu and Zhou, Ying and Peng, Fang and Liu, Mei and Li, Qiang and Wang, Zhiwei},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
pages={531--541},
year={2024},
organization={Springer}
}