This repository provides the official PyTorch implementation of the paper:
Wang Lecheng, Wang Maozhi, Xia Xueqin. Integrating Spectral, Topographic, and NDSI with Swin-UperNet for Glacier Extraction Under Complex Terrain Conditions, 2025,×(×):×-×.

Schematic diagram of the Swin-UperNet architecture
Swin-UperNet is a state-of-the-art semantic segmentation framework that combines the powerful Swin Transformer backbone with the UperNet head architecture. This repository provides an implementation of the Swin-UperNet model for high-performance semantic segmentation tasks.
Key features:
- 🚀 Swin Transformer backbone for hierarchical feature extraction
- 🔄 UperNet decoder for multi-scale feature fusion
- ⚡️ High efficiency with linear computational complexity
- 🏆 State-of-the-art performance on segmentation benchmarks
- ❄️ Specialized support for glacier segmentation tasks
- Python 3.12.6
- PyTorch 2.4.1
- TorchSummary 1.5.1
- TorchInfo 1.8.0
- Thop 0.1.1
- CUDA 11.8
- Timm 1.0.15
- numpy 1.26.4
- GDAL 3.8.4
- Linux environment recommended
To train a model on the glacier segmentation task, use the following command structure:
python train.py \
--MODEL_TYPE upernet \
--BACKBONE_TYPE swin_t \
--BANDS 10 \
--NUM_CLASS 3 \
--DATASET_PATH ./datasets/glacier \
--BATCH_SIZE 16 \
--EPOCHS 100 \
--OPTIMIZER_TYPE sgd \
--LOSS_TYPE ce \
--LR_SCHEDULER poly \
--INIT_LR 0.0005 \
--GPU_ID 0Key Parameters:
MODEL_TYPE: Model architecture (e.g.,upernet,deeplab,segnext, etc.)BACKBONE_TYPE: Backbone network (for models that support backbones, e.g.,swin_t,resnet50, etc.)BANDS: Number of input channels (10 for glacier data)NUM_CLASS: Number of classes (including background)DATASET_PATH: Path to the dataset directoryBATCH_SIZE: Batch size (adjust based on GPU memory)EPOCHS: Total training epochsOPTIMIZER_TYPE: Optimizer (sgdoradam)LOSS_TYPE: Loss function (cefor cross-entropy orfocalfor focal loss)LR_SCHEDULER: Learning rate scheduler (poly,step,cos, orexp)INIT_LR: Initial learning rateGPU_ID: ID of the GPU to use Note: The dataset should be organized in the following structure:
DATASET_PATH/
├── annotations/
│ ├── train.txt
│ └── val.txt
├── images/
│ ├── 1.tif
│ ├── 2.tif
│ └── ...
└── labels/
├── 1.tif
├── 2.tif
└── ...
All models were trained from scratch under identical conditions for fair comparison:
DeepLabv3+
Encoder-Decoder with Atrous Separable Convolution (Liang-Chieh Chen et al., 2018, ECCV).
Paper: arXiv:1802.02611 (PDF).U-Net
U-Net: Convolutional Networks for Biomedical Image Segmentation (Olaf Ronneberger et al., 2015, MICCAI).
Paper: arXiv:1505.04597 (PDF).SegFormer
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers (Enze Xie et al., 2021, NIPS).
Paper: arXiv:2105.15203 (PDF).SETR
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers (Sixiao Zheng et al., 2021, CVPR).
Paper: arXiv:2012.15840 (PDF).Swin-UperNet(ours)
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (Ze Liu et al., 2021, ICCV).
Paper: arXiv:2103.14030 (PDF).
Unified Perceptual Parsing for Scene Understanding (UPerNet) (Tete Xiao et al., 2018, ECCV).
Paper: arXiv:1807.10221 (PDF).