Skip to content

keke-nice/MedTVT-R1

Repository files navigation

🌟MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis

📖 Introduction

MedTVT-R1 is a multimodal large language model (LLM) designed to enhance medical reasoning and diagnosis by integrating Electrocardiogram (Time Series), Chest X-ray (Visual Image), and Blood Test (Tabular Data). The model combines advanced modality-aware layers and reinforcement fine-tuning techniques to deliver improved diagnostic interpretability and accuracy.

top_fig

📝 Abstract

Accurate and interpretable multi-disease diagnosis remains a critical challenge in medical research, particularly when leveraging heterogeneous multimodal medical data. Current approaches often rely on single-modal data, limiting their ability to comprehensively understand complex diseases. To address this, we propose MedTVT-R1, a novel Multimodal Large Language Model (MLLM) framework designed to integrate clinical multimodal data for reasoning and diagnosing multiple diseases. We construct MedTVT-QA, a curated instruction dataset that provides question-answer pairs for physiological-level interpretations and disease-level diagnoses with a Chain of Evidence approach. MedTVT-R1 incorporates a modality perception layer to capture inter-modal dependencies and adaptively weight modality contributions. Additionally, we employ Group Relative Policy Optimization (GRPO)-based Reinforcement Fine-Tuning with a Jaccard Reward function to enhance diagnostic reasoning. Experimental results demonstrate MedTVT-R1’s superiority in multimodal feature utilization and multi-disease diagnosis, offering significant potential for clinical applications such as diagnostic report generation and comorbidity reasoning.

framework

📂 Data Preparation

  • Download the MIMIC-IV-ECG and MIMIC-IV-CXR datasets from the PhysioNet website and store them in the ./Dataset
  • Download our preprocessed MedTVT-QA dataset and store it in the ./QA .
  • Download the pretrained weights for the ECG encoder and lab encoder from Huggingface and store them in the ./CKPTS .
  • Download the original version of the LLaMA3.2-1B pretrained weights and store them in the ./CKPTS/LLaMA3.2-1B-Instruct.

🚀 Environment Setup

Create and activate a new Anaconda environment using the following commands:

conda create --name MedTVT-R1 python==3.9.17
conda activate MedTVT-R1
pip install -r requirements.txt
cd MedTVT-R1

💪 Pre-training Phase (PT)

Run the following command to start the pre-training phase:

bash PT.sh

🔧 Supervised Fine-tuning Phase (SFT)

Run the following command to start the supervised fine-tuning phase:

bash SFT.sh

🎯 Reinforcement Fine-tuning Phase (RFT)

Run the following command to start the reinforcement learning fine-tuning phase:

bash RFT.sh

📚 Citation

If you use MedTVT-R1 in your research, please cite our paper:

@article{zhang2025medtvt,
  title={MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis},
  author={Zhang, Yuting},
  journal={Preprint, under review},
  year={2025}
}

About

CVPR2026

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published