Shiyu Miao (缪师宇)* Delong Chen (陈德龙)* Fan Liu (刘凡)✉
Chuanyi Zhang (张传一) Yanhui Gu (顾彦慧) Shengjie Guo (郭晟杰) Jun Zhou (周峻)
* Equal Contribution
DirectSAM-RS is a vision-language foundation model designed for semantic contour extraction in optical remote sensing imagery. It builds on the DirectSAM model, which is pretrained on the SA-1B dataset and offers robust contour extraction. However, DirectSAM is non-interactive and class-agnostic, which limits its use in domain-specific applications like remote sensing.
To address these limitations, DirectSAM-RS introduces:
- Text-guided contour extraction: Unlike previous visual-only models, DirectSAM-RS accepts free-form textual prompts to specify semantic targets, enabling zero-shot contour extraction without requiring downstream training samples.
- Cross-domain generalization: DirectSAM-RS transfers contour extraction knowledge from natural images to remote sensing by leveraging a large-scale dataset curated from existing segmentation datasets (LoveDA, iSAID, DeepGlobe, RefSegRS), resulting in 34k image-text-contour triplets (RemoteContour-34k).
- Flexible prompting architecture: A novel prompter design that fuses semantic information from textual prompts with image features via cross-attention, allowing the model to conditionally extract contours based on the input prompt.
DirectSAM-RS is implemented based on the Huggingface framework, with implementation details available in model.py
.
We constructed a semantic contour extraction dataset by repurposing existing semantic segmentation datasets with our proposed Mask2Contour (M2C) transformation. The M2C process produces a total of 34k image-text-contour triplets from LoveDA, iSAID, DeepGlobe, and RefSegRS datasets. We name this resulting dataset RemoteContour-34k.
The RemoteContour-34k dataset are available for download via BaiduNetdisk.
- BaiduNetdisk Link: Click here to download
- Extraction Code:
mmsy
We validate DirectSAM-RS on three downstream contour extraction datasets: SLSD for coastline extraction, Beijing Urban Building Extraction (BUBE), and LRSNY for road extraction. These three downstream task datasets can also be downloaded via BaiduNetdisk.
- BaiduNetdisk Link: Click here to download
- Extraction Code:
mmsy
The Mask2Contour (M2C) transformation is a simple and effective method for extracting semantic contours from segmentation masks. This approach leverages the cv2.findContours
function from OpenCV to efficiently convert segmented regions into their corresponding contours.
If you want to convert other semantic segmentation labels into contours, you can refer to the code in the utils
folder. If your semantic segmentation labels are single-channel like LoveDA, use utils/M2C_1channel.py
. If your labels are three-channel like iSAID, use utils/M2C_3channel.py
. You will need to modify the file paths and category_dict
accordingly when using the scripts.