Skip to content
/ SARMAE Public

Official repo for "SARMAE: Masked Autoencoder for SAR Representation Learning"

Notifications You must be signed in to change notification settings

MiliLab/SARMAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

SARMAE: Masked Autoencoder for SAR Representation Learning

Danxu Liu1,4 *, Di Wang2,4 *, Hebaixu Wang2,4 *, Haoyang Chen2,4 *, Wentao Jiang2, Yilin Cheng3,4, Haonan Guo2,4, Wei Cui1 †, Jing Zhang2,4 †.

1 Beijing Institute of Technology, 2 Wuhan University, 3 Fudan University, 4 Zhongguancun Academy.

* Equal contribution. † Corresponding authors.

Update | Abstract | Datasets | Models | Usage | Statement

πŸ”₯ Update

2025.12.19

🌞 Abstract

Synthetic Aperture Radar (SAR) imagery plays a critical role in all-weather, day-and-night remote sensing applications. However, existing SAR-oriented deep learning is constrained by data scarcity, while the physically grounded speckle noise in SAR imagery further hampers fine-grained semantic representation learning. To address these challenges, we propose SARMAE, a Noise-Aware Masked Autoencoder for self-supervised SAR representation learning. Specifically, we construct SAR-1M, the first million-scale SAR dataset, with additional paired optical images, to enable large-scale pre-training. Building upon this, we design Speckle-Aware Representation Enhancement (SARE), which injects SAR-specific speckle noise into masked autoencoders to facilitate noise-aware and robust representation learning. Furthermore, we introduce Semantic Anchor Representation Constraint (SARC), which leverages paired optical priors to align SAR features and ensure semantic consistency. Extensive experiments across multiple SAR datasets demonstrate that SARMAE achieves state-of-the-art performance on classification, detection, and segmentation tasks.

Figure 1. Overview of the SARMAE pretraining framework. The framework consists of two branches: (i) a SAR branch following the MAE architecture with Speckle-Aware Representation Enhancement (SARE) to handle inherent speckle noise, and (ii) an optical branch using a frozen DINOv3 encoder. For paired SAR-optical data, Semantic Anchor Representation Constraint (SARC) aligns SAR features with semantic-rich optical representations. Unpaired SAR images are processed solely through the SAR branch.

πŸ“– Datasets

Figure 2. The organization of data sources in SAR-1M.

πŸš€ Models

Coming Soon.

πŸ”¨ Usage

Coming Soon.

🍭 Results

Figure 3. SARMAE outperforms SOTA methods on multiple datasets. 1: 40-SHOT; 2: 30% labeled. a: Multi-classes; b: Water.

Method FUSAR-SHIP MSTAR SAR-ACD
40-shot 30% 40-shot 30% 30%
ResNet-50 - 58.41 - 89.94 59.70
Swin Transformer - 60.79 - 82.97 67.50
Bet 59.70 71.13 40.70 69.75 79.77
LoMaR 82.70 - 77.00 - -
SAR-JEPA 85.80 - 91.60 - -
SUMMIT - 71.91 - 98.39 84.25
SARMAE(ViT-B) 89.30 92.92 96.70 99.61 95.06
SARMAE(ViT-L) 90.86 92.80 97.24 98.92 95.63

Table 1. Performance comparison (Top1 Accuracy, %) of different methods on the target classification task.

Method SARDet-100k SSDD Method RSAR
ImageNet 52.30 66.40 RoI Transformer 35.02
Deformable DETR 50.00 52.60 Def. DETR 46.62
Swin Transformer 53.80 40.70 RetinaNet 57.67
ConvNeXt 55.10 - ARS-DETR 61.14
CATNet - 64.66 R3Det 63.94
MSFA 56.40 - ReDet 64.71
SARAFE 57.30 67.50 O-RCNN 64.82
SARMAE(ViT-B) 57.90 68.10 SARMAE(ViT-B) 66.80
SARMAE(ViT-L) 63.10 69.30 SARMAE(ViT-L) 72.20

Table 2. Performance comparison (mAP, %) of different methods on horizontal and oriented object detection tasks.

Method Multiple classes Water
Industrial Area Natural Area Land Use Water Housing Other mIoU IoU
FCN 37.78 71.58 1.24 72.76 67.69 39.05 48.35 85.95
ANN 41.23 72.92 0.97 75.95 68.40 56.01 52.58 87.32
PSPNet 33.99 72.31 0.93 76.51 68.07 57.07 51.48 87.13
DeepLab V3+ 40.62 70.67 0.55 72.93 69.96 34.53 48.21 87.53
PSANet 40.70 69.46 1.33 69.46 68.75 32.68 47.14 86.18
DANet 39.56 72.00 1.00 74.95 67.79 56.28 39.56 89.29
SARMAE(ViT-B) 65.87 75.65 29.20 84.01 73.23 71.21 66.53 92.31
SARMAE(ViT-L) 65.84 78.04 29.47 87.12 75.22 69.34 67.51 93.06

Table 3. Performance comparison of semantic segmentation methods on multiple classes and water classes.

⭐ Citation

If you find SARMAE helpful, please give a ⭐ and cite it as follows:

@misc{liu2025sarmaemaskedautoencodersar,
      title={SARMAE: Masked Autoencoder for SAR Representation Learning}, 
      author={Danxu Liu and Di Wang and Hebaixu Wang and Haoyang Chen and Wentao Jiang and Yilin Cheng and Haonan Guo and Wei Cui and Jing Zhang},
      year={2025},
      eprint={2512.16635},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.16635}, 
}

🎺 Statement

For any other questions please contact Danxu Liu at bit.edu.cn or gmail.com.

About

Official repo for "SARMAE: Masked Autoencoder for SAR Representation Learning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published