Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding (ICLR 2025)

Data processing

Download the CT-RATE dataset into the data folder.
Download ImageNet pre-trained ViT weights from link, and BiomedVLP-CXR-BERT-specialized text encoder from link, as used by CT-CLIP.

Download the decomposed anatomy-wise descriptions from our provided supplementary materials link, and process the CT volume with the following commands.

cd data
python fix_data.py --split [train/valid]
python generate_mask.py --split [train/valid]
python resize.py --split [train/valid]
python preprocess.py --split [train/valid]

The processed results.

|-- BiomedVLP-CXR-BERT
|-- data
|   |-- train
|   |-- valid
|   |-- train_fixed
|   |-- valid_fixed
|   |-- train_mask
|   |-- valid_mask
|   |-- resized_train_images
|   |-- resized_train_masks
|   |-- resized_valid_images
|   |-- resized_valid_masks
|   |-- processed_train_images
|   |-- processed_train_masks
|   |-- processed_valid_images
|   |-- processed_valid_masks
|   |-- multi_abnormality_labels
|   |-- desc_info.json
|   |-- conc_info.json
|-- mae_pretrain_vit_base.pth

Training

torchrun --nproc_per_node=4 train.py

Evaluation

torchrun --nproc_per_node=4 eval.py

Then, you can calculate the metrics using the generated CSV file.

python calc_metrics.py --csv_file res/xxx.csv

Citation

If you find this repository useful, please cite:

@inproceedings{fvlm_iclr25,
  title={Large-scale and fine-grained vision-language pre-training for enhanced CT image understanding},
  author={Zhongyi Shui, Jianpeng Zhang, Weiwei Cao, Sinuo Wang, Ruizhe Guo, Le Lu, Lin Yang, Xianghua Ye, Tingbo Liang, Qi Zhang, Ling Zhang},
  booktitle={The Thirteenth International Conference on Learning Representations},
  pages={},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
lavis		lavis
README.md		README.md
blip_pretrain.py		blip_pretrain.py
calc_metrics.py		calc_metrics.py
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding (ICLR 2025)

Data processing

Training

Evaluation

Citation

About

Releases

Packages

Languages

alibaba-damo-academy/fvlm

Folders and files

Latest commit

History

Repository files navigation

Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding (ICLR 2025)

Data processing

Training

Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages