🌟MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis

📖 Introduction

MedTVT-R1 is a multimodal large language model (LLM) designed to enhance medical reasoning and diagnosis by integrating Electrocardiogram (Time Series), Chest X-ray (Visual Image), and Blood Test (Tabular Data). The model combines advanced modality-aware layers and reinforcement fine-tuning techniques to deliver improved diagnostic interpretability and accuracy.

📝 Abstract

Accurate and interpretable multi-disease diagnosis remains a critical challenge in medical research, particularly when leveraging heterogeneous multimodal medical data. Current approaches often rely on single-modal data, limiting their ability to comprehensively understand complex diseases. To address this, we propose MedTVT-R1, a novel Multimodal Large Language Model (MLLM) framework designed to integrate clinical multimodal data for reasoning and diagnosing multiple diseases. We construct MedTVT-QA, a curated instruction dataset that provides question-answer pairs for physiological-level interpretations and disease-level diagnoses with a Chain of Evidence approach. MedTVT-R1 incorporates a modality perception layer to capture inter-modal dependencies and adaptively weight modality contributions. Additionally, we employ Group Relative Policy Optimization (GRPO)-based Reinforcement Fine-Tuning with a Jaccard Reward function to enhance diagnostic reasoning. Experimental results demonstrate MedTVT-R1’s superiority in multimodal feature utilization and multi-disease diagnosis, offering significant potential for clinical applications such as diagnostic report generation and comorbidity reasoning.

📂 Data Preparation

Download the MIMIC-IV-ECG and MIMIC-IV-CXR datasets from the PhysioNet website and store them in the ./Dataset
- 📥MIMIC-IV-ECG
- 📥MIMIC-CXR-JPG
Download our preprocessed MedTVT-QA dataset and store it in the ./QA .
- 📥MedTVT-QA Dataset
Download the pretrained weights for the ECG encoder and lab encoder from Huggingface and store them in the ./CKPTS .
- 📥Pretrained Weights
Download the original version of the LLaMA3.2-1B pretrained weights and store them in the ./CKPTS/LLaMA3.2-1B-Instruct.
- 📥LLaMA3.2-1B Pretrained Weights

🚀 Environment Setup

Create and activate a new Anaconda environment using the following commands:

conda create --name MedTVT-R1 python==3.9.17
conda activate MedTVT-R1
pip install -r requirements.txt
cd MedTVT-R1

💪 Pre-training Phase (PT)

Run the following command to start the pre-training phase:

bash PT.sh

🔧 Supervised Fine-tuning Phase (SFT)

Run the following command to start the supervised fine-tuning phase:

bash SFT.sh

🎯 Reinforcement Fine-tuning Phase (RFT)

Run the following command to start the reinforcement learning fine-tuning phase:

bash RFT.sh

📚 Citation

If you use MedTVT-R1 in your research, please cite our paper:

@article{zhang2025medtvt,
  title={MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis},
  author={Zhang, Yuting},
  journal={Preprint, under review},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
fig		fig
llama		llama
util		util
PT.sh		PT.sh
README.md		README.md
RFT.sh		RFT.sh
SFT.sh		SFT.sh
engine_train.py		engine_train.py
evaluate.py		evaluate.py
grpo_train.py		grpo_train.py
inference.py		inference.py
inference.sh		inference.sh
main_train.py		main_train.py
requirements.txt		requirements.txt
vllm_grpo_trainer.py		vllm_grpo_trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis

📖 Introduction

📝 Abstract

📂 Data Preparation

🚀 Environment Setup

💪 Pre-training Phase (PT)

🔧 Supervised Fine-tuning Phase (SFT)

🎯 Reinforcement Fine-tuning Phase (RFT)

📚 Citation

About

Uh oh!

Releases

Packages

Languages

keke-nice/MedTVT-R1

Folders and files

Latest commit

History

Repository files navigation

🌟MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis

📖 Introduction

📝 Abstract

📂 Data Preparation

🚀 Environment Setup

💪 Pre-training Phase (PT)

🔧 Supervised Fine-tuning Phase (SFT)

🎯 Reinforcement Fine-tuning Phase (RFT)

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages