《AWM-SAM: A SAM-Enhanced Dual-Stream Network with Text-Guided Asymmetric Wavelet Modulation for Referring Remote Sensing Image Segmentation》code
environment PyTorch 2.0.0 Python 3.8(ubuntu20.04) CUDA 11.8 GPU vGPU-32GB(32GB) * 1 CPU 10 vCPU Intel(R) Xeon(R) Gold 6459C
mkdir ./pretrained_weights Place sam_vit_b_01ec64.pth(https://huggingface.co/datasets/Gourieff/ReActor/blob/main/models/sams/sam_vit_b_01ec64.pth) in the pretrained_weights folder.
Download pre-trained classification weights of
the Swin Transformer,
and put the pth file in ./pretrained_weights.
These weights are needed for training to initialize the visual encoder.
3. Download BERT weights from HuggingFace’s Transformer library,
and put it in the root directory.
The RefSegRS , RRSIS-D datasets can be downloaded from the link https://github.com/Shaosifan/FIANet.
We use one GPU to train our model. For training on RefSegRS dataset:
python train.py --dataset refsegrs --model_id FIANet --epochs 60 --lr 5e-5 --num_tmem 1
For training on RRSIS-D dataset:
python train.py --dataset rrsisd --model_id FIANet --epochs 40 --lr 3e-5 --num_tmem 3
Test for RefSegRS dataset: python test.py --swin_type base --dataset refsegrs --resume ./your_checkpoints_path --split test --window12 --img_size 480 --num_tmem 1 Test for RRSIS-D dataset: python test.py --swin_type base --dataset rrsisd --resume ./your_checkpoints_path --split test --window12 --img_size 480 --num_tmem 3
Code in this repository is built on FIANet