MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation

Chenfei Liao¹, Xu Zheng^1,2 (Project lead), Yuanhuiyi Lyu¹, Haiwei Xue⁵, Yihong Cao⁴,

Jiawen Wang⁶, Kailun Yang⁴, Xuming Hu^1,3 (Corresponding author)

¹HKUST(GZ), ²INSAIT, ³HKUST, ⁴HNU, ⁵THU, ⁶CUMTB

Abstract

Research has focused on Multi-Modal Semantic Segmentation (MMSS), where pixel-wise predictions are derived from multiple visual modalities captured by diverse sensors. Recently, the large vision model, Segment Anything Model 2 (SAM2), has shown strong zero-shot segmentation performance on both images and videos. When extending SAM2 to MMSS, two issues arise:

🔥1. How can SAM2 be adapted to multi-modal data?

🔥2. How can SAM2 better understand semantics?

Inspired by cross-frame correlation in videos, we propose to treat multi-modal data as a sequence of frames representing the same scene. Our key idea is to "memorize" the modality-agnostic information and "memorize" the semantics related to the targeted scene. To achieve this, we apply SAM2’s memory mechanisms across multi-modal data to capture modality-agnostic features. Meanwhile, to memorize the semantic knowledge, we propose a training-only Semantic Prototype Memory Module (SPMM) to store category-level prototypes across training for facilitating SAM2’s transition from instance to semantic segmentation. A prototypical adaptation loss is imposed between global and local prototypes iteratively to align and refine SAM2's semantic understanding. Extensive experimental results demonstrate that our proposed MemorySAM outperforms SoTA methods by large margins on both synthetic and real-world benchmarks (65.38% on DELIVER, 52.88% on MCubeS).

News

⭐ If you find any problems in our code, please contact us! We will fix them as soon as possible!

📧 lcfgreat624@gmail.com, cliao127@connect.hkust-gz.edu.cn

🚩 2025/3/10 Our paper has been online on Arxiv: https://arxiv.org/pdf/2503.06700

🚩 2025/3/13 We release the first version of our souce code! The weight will be released soon~

🚩 2025/4/23 We release the weights of MemorySAM on DELIVER dataset! Click this: Link

🚩 2025/8/20 We release the weights of MemorySAM on MCubes dataset! Click this: Link

Framework

Code Structure Illustration

About the entire model part, we use the same code as standard SAM2, which is in "MemorySAM/semseg/models/sam2". We clone these files from SAM's official code at the beginning of our project. The MemorySAM model code is mainly in "MemorySAM/semseg/models/sam2/sam2/sam_lora_image_encoder_seg.py", with the model named as LoRA_Sam. Finally, in "train_sam2_lora.py", we import this model and train.

Preparation

Environment Setup

Create a new Conda environment and activate it:

conda create -n MMSS_SAM python=3.10 
conda activate MMSS_SAM

Download SAM2's weight and upload it into the semseg/models/sam2/checkpoints directory. Facebook Research SAM2 Repository

Install PyTorch and related libraries:

pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121

Install additional dependencies:
```
pip install -r requirements.txt
```
Navigate to the model directory and install:
```
cd semseg/models/sam2
pip install -e .
```

Run

Data Preparation

Download the DELIVER/MCubes dataset and place it into the data/ directory.

Running the Model

Execute the following command to start the model:
```
sh run_sam.sh
```
🚨 ATTENTION!!! 🚨 Line 233 in MemorySAM/semseg/models/sam2/sam2/sam_lora_image_encoder_seg.py needs to be consistent with the number of modalities.

Acknowledgements

🤝 Our work is based on project of DELIVER and SAM2. Thanks to their contributions to this community!!!

🤝 Also, thanks to DELIVER and MCubes for their efforts to build such valuable datasets!!!

🤝 Moreover, thanks to Xu Zheng (zhengxu128@gmail.com) for his great guidance and help for this project, who is the lead of this project!!!

References

If you find this project helpful, please consider citing the following paper:

@misc{liao2025memorysammemorizemodalitiessemantics,
      title={MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation}, 
      author={Chenfei Liao and Xu Zheng and Yuanhuiyi Lyu and Haiwei Xue and Yihong Cao and Jiawen Wang and Kailun Yang and Xuming Hu},
      year={2025},
      eprint={2503.06700},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.06700}, 
}

Thank you for your interest and support!

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Figure		Figure
configs		configs
semseg		semseg
tools		tools
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
requirements.txt		requirements.txt
run_sam.sh		run_sam.sh
train_sam2_lora.py		train_sam2_lora.py
val_mm.py		val_mm.py
val_mm_sam.py		val_mm_sam.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation

Abstract

News

Framework

Code Structure Illustration

Preparation

Environment Setup

Run

Data Preparation

Running the Model

Acknowledgements

References

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Chenfei-Liao/MemorySAM

Folders and files

Latest commit

History

Repository files navigation

MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation

Abstract

News

Framework

Code Structure Illustration

Preparation

Environment Setup

Run

Data Preparation

Running the Model

Acknowledgements

References

Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages