Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster

Fenghe Tang^1,2*, Wenxin Ma^1,2*, Zhiyang He³, Xiaodong Tao³, Zihang Jiang^1,2†, S. Kevin Zhou^1,2†

^*Equal contribution. ^†Corresponding author.

¹ School of Biomedical Engineering, University of Science and Technology of China
² Suzhou Institute for Advanced Research, University of Science and Technology of China
³ Anhui IFLYTEK CO., Ltd.

News 🥰:

LLM4Seg is accepted by MICCAI 2025! 🎉

Introduction

With the advancement of Large Language Model (LLM) for natural language processing, this paper presents an intriguing finding: a frozen pre-trained LLM layer can process visual tokens for medical image segmentation tasks. Specifically, we propose a simple hybrid structure (LLM4Seg) that integrates a pre-trained, frozen LLM layer within the CNN encoder-decoder framework. Surprisingly, this design improves segmentation performance with a minimal increase in trainable parameters across various modalities, including ultrasound, dermoscopy, polypscopy, and CT scans. Our in-depth analysis reveals the potential of transferring LLM's semantic awareness to enhance segmentation tasks, offering both improved global understanding and better local modeling capabilities. The improvement proves robust across different LLMs, validated using LLaMA and DeepSeek.

Get Start

Environment

Pytorch: 2.5-Cuda 12.4
Python: 3.9
transformer: 4.46.3 (LLM environment)
albumentations: 1.2.0（medical image augmentation）

LLM prepare

Apply LLama 3.2-1B or DeepSeek-R1 on HuggingFace 🤗🤗🤗
huggingface-cli login

Datasets

Please put the dataset (e.g. BUSI) or your own dataset as the following architecture.

└── LLM4Seg
    ├── data
        ├── busi
            ├── images
            |   ├── benign (10).png
            │   ├── malignant (17).png
            │   ├── ...
            |
            └── masks
                ├── 0
                |   ├── benign (10).png
                |   ├── malignant (17).png
                |   ├── ...
        ├── your dataset
            ├── images
            |   ├── 0a7e06.png
            │   ├── ...
            |
            └── masks
                ├── 0
                |   ├── 0a7e06.png
                |   ├── ...
    ├── dataloader
    ├── network
    ├── utils
    ├── main.py
    └── split.py

Simple integration into your network

from network.llm4seg import LLM4Seg


#   unfreeze (bool): Whether to unfreeze the LLM layer for fine-tuning.
#   need_init (bool): Whether to initialize the LLM layer.
#   mode (str): LLM layer: "LLaMA" or "DeepSeeK".
#   channel (int): Number of input feature channels, typically from the encoder output (e.g., dims[4]).
#   layer (int): i-th LLM layers to use.
#   h (int): Height of the input feature map.
#   w (int): Width of the input feature map.
llm4seg = LLM4Seg(unfreeze=False, need_init=False, mode="llama", channel=dims[4], layer=14, h=16, w=16)

# forward
fs_boosted = llm4seg(fs)

Training and Validation

You can first split your dataset:

python split.py --dataset_name busi --dataset_root ./data

Train and validate your dataset:

# + DeepSeeK 28-th layer:
python main.py --mode deepseek --layer 27 --base_dir ./data/busi --train_file_dir busi_train.txt --val_file_dir busi_val.txt
# + DeepSeeK (T) 18-th layer:
python main.py --mode deepseek --layer 17 --unfreeze --base_dir./data/busi --train_file_dir busi_train.txt --val_file_dir busi_val.txt
# + DS Transformer 18-th layer:
python main.py --mode deepseek --layer 17 --need_init --base_dir./data/busi --train_file_dir busi_train.txt --val_file_dir busi_val.txt


# + LLaMA 18-th layer:
python main.py --mode llama --layer 17 --base_dir ./data/busi --train_file_dir busi_train.txt --val_file_dir busi_val.txt
# + LLaMA (T) 8-th layer:
python main.py --mode deepseek --layer 7 --unfreeze --base_dir./data/busi --train_file_dir busi_train.txt --val_file_dir busi_val.txt

Acknowledgements:

This code uses helper functions from CMUNeXt.

Citation

If the code, paper and weights help your research, please cite:

@article{llm4seg,
  title={Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster},
  author={Tang, Fenghe and Ma, Wenxin and He, Zhiyang and Tao, Xiaodong and Jiang, Zihang and Zhou, S Kevin},
  journal={arXiv preprint arXiv:2506.18034},
  year={2025}
}

License

This project is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster

Introduction

Get Start

Environment

LLM prepare

Datasets

Simple integration into your network

Training and Validation

Acknowledgements:

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataloader		dataloader
img		img
network		network
utils		utils
LICENSE		LICENSE
README.md		README.md
main.py		main.py
split.py		split.py

License

FengheTan9/LLM4Seg

Folders and files

Latest commit

History

Repository files navigation

Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster

Introduction

Get Start

Environment

LLM prepare

Datasets

Simple integration into your network

Training and Validation

Acknowledgements:

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages