SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks
Xinyu Xiong, Zihuang Wu, Lei Zhang, Lei Lu, Ming Li, Guanbin Li
Recent studies have highlighted the potential of adapting the Segment Anything Model (SAM) for various downstream tasks. However, constructing a more powerful and generalizable encoder to further enhance performance remains an open challenge. In this work, we propose SAM2-UNeXT, an advanced framework that builds upon the core principles of SAM2-UNet while extending the representational capacity of SAM2 through the integration of an auxiliary DINOv2 encoder. By incorporating a dual-resolution strategy and a dense glue layer, our approach enables more accurate segmentation with a simple architecture, relaxing the need for complex decoder designs. Extensive experiments conducted on four benchmarks, including dichotomous image segmentation, camouflaged object detection, marine animal segmentation, and remote sensing saliency detection, demonstrate the superior performance of our proposed method.
SAM2-UNeXT is an improved version based on SAM2-UNet, so it is recommended that new users first become familiar with the previous version.
git clone https://github.com/WZH0120/SAM2-UNeXT.git
cd SAM2-UNeXT/
You can refer to the following repositories and their papers for the detailed configurations of the corresponding datasets.
- Dichotomous Image Segmentation. Please refer to BiRefNet.
- Camouflaged Object Detection. Please refer to FEDER. [#issue #13, #44]
- Marine Animal Segmentation. Please refer to MASNet.
- Remote Sensing Saliency Detection. Please refer to ORSI-SOD.
Our project does not depend on installing SAM2. If you have already configured an environment for SAM2, then directly using this environment should also be fine. You may also create a new conda environment:
conda create -n sam2-unext python=3.10
conda activate sam2-unext
pip install -r requirements.txt
If you want to train your own model, please download:
- the pre-trained segment anything 2 (not SAM2.1, [#issue #18, #30]) from here
- the pre-trained dinov2 from here
After the above preparations, you can run train.sh
to start your training.
Our prediction maps can be found at Google Drive. Also, you can run test.sh
to obtain your own predictions.
After obtaining the prediction maps, you can run eval.sh
to get the quantitative results.
Please cite the following paper and star this project if you use this repository in your research. Thank you!
@article{xiong2025sam2,
title={SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks},
author={Xiong, Xinyu and Wu, Zihuang and Zhang, Lei and Lu, Lei and Li, Ming and Li, Guanbin},
journal={arXiv preprint arXiv:2508.03566},
year={2025}
}