LIMITR is a multi-modal representation learning model for chest X-ray images and reports.
The model is based on a novel alignment scheme between the visual data and the text, which takes into account both local and global information.
Furthermore, the model integrates domain-specific information of two types -- lateral images and the consistent visual structure of chest images.
Our representation is shown to benefit three types of retrieval tasks: text-image retrieval, class-based retrieval, and phrase-grounding.
LIMITR manuscript
Gefen Dawidowicz, Elad Hirsch, Ayellet Tal
Technion – Israel Institute of Technology
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023
We used Python 3.8 with pytorch 1.11
To clone this repository:
git clone https://github.com/gefend/LIMITR.git
To install Python requirements:
pip install -r requirements.txt
- Download MIMIC-CXR dataset MIMIC-CXR.
- Update the path to MIMIC directory (
DATA_BASE_DIR
) on./LIMITR/constants.py
. - Extract the file
mimic_csv.tar.gz
into amimic_csv
directory. - The splits we used for evaluation and training are available on the
./mimic_csv
directory.
Update the desired training configuration on ./configs/mimic_config.yaml
Train the model with the following command:
python run.py -c ./configs/mimic_config.yaml --train
Test the model with the following command:
python run.py -c ./configs/mimic_config.yaml --test ---ckpt_path=ckpt_path
Update ckpt_path
with the desired checkpoint for evaluation.
@InProceedings{Dawidowicz_2023_ICCV,
author = {Dawidowicz, Gefen and Hirsch, Elad and Tal, Ayellet},
title = {LIMITR: Leveraging Local Information for Medical Image-Text Representation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {21165-21173}
}