Skip to content

StevenMsy/DirectSAM-RS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prompting DirectSAM for Semantic Contour Extraction
in Remote Sensing Images

Shiyu Miao (缪师宇)* Logo     Delong Chen (陈德龙)* Logo     Fan Liu (刘凡)Logo    

Chuanyi Zhang (张传一) Logo     Yanhui Gu (顾彦慧) Logo     Shengjie Guo (郭晟杰) Logo    Jun Zhou (周峻) Logo

Logo            Logo             Logo            Logo

* Equal Contribution

Introduction

DirectSAM-RS is a vision-language foundation model designed for semantic contour extraction in optical remote sensing imagery. It builds on the DirectSAM model, which is pretrained on the SA-1B dataset and offers robust contour extraction. However, DirectSAM is non-interactive and class-agnostic, which limits its use in domain-specific applications like remote sensing.

To address these limitations, DirectSAM-RS introduces:

  • Text-guided contour extraction: Unlike previous visual-only models, DirectSAM-RS accepts free-form textual prompts to specify semantic targets, enabling zero-shot contour extraction without requiring downstream training samples.
  • Cross-domain generalization: DirectSAM-RS transfers contour extraction knowledge from natural images to remote sensing by leveraging a large-scale dataset curated from existing segmentation datasets (LoveDA, iSAID, DeepGlobe, RefSegRS), resulting in 34k image-text-contour triplets (RemoteContour-34k).
  • Flexible prompting architecture: A novel prompter design that fuses semantic information from textual prompts with image features via cross-attention, allowing the model to conditionally extract contours based on the input prompt.

DirectSAM-RS is implemented based on the Huggingface framework, with implementation details available in model.py.

RemoteContour-34k

We constructed a semantic contour extraction dataset by repurposing existing semantic segmentation datasets with our proposed Mask2Contour (M2C) transformation. The M2C process produces a total of 34k image-text-contour triplets from LoveDA, iSAID, DeepGlobe, and RefSegRS datasets. We name this resulting dataset RemoteContour-34k.

The RemoteContour-34k dataset are available for download via BaiduNetdisk.

Downstream task datasets

We validate DirectSAM-RS on three downstream contour extraction datasets: SLSD for coastline extraction, Beijing Urban Building Extraction (BUBE), and LRSNY for road extraction. These three downstream task datasets can also be downloaded via BaiduNetdisk.

Mask2Contour (M2C) transformation

The Mask2Contour (M2C) transformation is a simple and effective method for extracting semantic contours from segmentation masks. This approach leverages the cv2.findContours function from OpenCV to efficiently convert segmented regions into their corresponding contours.

If you want to convert other semantic segmentation labels into contours, you can refer to the code in the utils folder. If your semantic segmentation labels are single-channel like LoveDA, use utils/M2C_1channel.py. If your labels are three-channel like iSAID, use utils/M2C_3channel.py. You will need to modify the file paths and category_dict accordingly when using the scripts.

About

official code for "DirectSAM-RS"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published