This is the implementation of MG-3D: Multi-Grained Knowledge-Enhanced Vision-Language Pre-training for 3D Medical Image Analysis.
Run the following command to install the required packages:
pip install -r requirements.txtYou can download the CT-RATE and CTRG-Chest datasets used in this work via the Hugging Face repository (https://huggingface.co/datasets/ibrahimhamamci/CT-RATE) and Github (https://github.com/tangyuhao2016/CTRG).
The project structure should be:
root:[.]
+--mg3d
| +--datasets
| +--datamodules
| +--metrics
| +--models
| +--config.py
| +--__init__.py
+--prepro
| +--glossary.py
| +--make_arrow.py
| +--prepro_finetuning_language_data.py
| +--prepro_finetuning_data.py
| +--prepro_finetuning_vision_data.py
| +--prepro_pretraining_data.py
+--data
| +--pretrain_arrows
| +--finetune_arrows
| +--finetune_vision_arrows
| +--finetune_language_arrows
+--run_scripts
| +--pretrain.sh
| +--finetune.sh
+--tools
| +--visualize_datasets.py
| +--convert_meter_weights.py
+--downstream
| +--ACDC
| +--cc-ccii
| +--Covid19_20
| +--CT-RATE
| +--CTRG
| +--Luna16
| +--MSD
| +--stoic2021
+--requirements.txt
+--README.md
+--main.py
Run the following command to pre-process the data:
python prepro/prepro_pretraining_data.py
to get the following arrow files:
root:[data]
+--pretrain_arrows
| +--clm_chest_ctrg_train.arrow
| +--clm_chest_ctrg_val.arrow
| +--clm_chest_ctrg_test.arrow
| +--clm_ct_rate_train.arrow
| +--clm_ct_rate_val.arrow
| +--clm_ct_rate_test.arrow
Now we can start to pre-train the ptunifer model:
Single GPU:
bash run_scripts/pretrain.sh
Multiple GPUs:
bash run_scripts/pretrain_multi_gpus.sh
We provide various models for downstream tasks. You can find the 3D Swin-B-47K, 3D Swin-L-47K, 3D UNet-1.4K, and 3D nn-UNet-1.4K.
The code is based on PTunifier, MONAI, CT-CLIP, M2KT.
We thank the authors for their open-sourced code and encourage users to cite their works when applicable.
If you find this repo useful for your research, please consider citing the paper as follows:
@article{NI2026104027,
title = {MG-3D: Multi-Grained Knowledge-Enhanced Vision-Language Pre-training for 3D Medical Image Analysis},
journal = {Medical Image Analysis},
pages = {104027},
year = {2026},
issn = {1361-8415},
doi = {https://doi.org/10.1016/j.media.2026.104027},
url = {https://www.sciencedirect.com/science/article/pii/S1361841526000964},
author = {Xuefeng Ni and Linshan Wu and Jiaxin Zhuang and Qiong Wang and Mingxiang Wu and Varut Vardhanabhuti and Lihai Zhang and Hanyu Gao and Hao Chen},}
}