SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting

Shuhao Mei^1,2,7, Yongchao Long², Shan Cao³, Xiaobo Han⁴, Shijia Geng⁵, Jinbo Sun^1,*, Yuxi Zhou^2,6,*, Shenda Hong^7,*

¹Xidian University ²Tianjin University of Technology ³The Second Hospital of Tianjin Medical University ⁴Chinese PLA General Hospital ⁵HeartVoice Medical Technology ⁶Tsinghua University ⁷Peking University

^*Corresponding Author

Introduction

SpiroLLM is the first multimodal large language model specifically designed to interpret spirogram time-series data, providing diagnostic support for Chronic Obstructive Pulmonary Disease (COPD). By integrating raw spirometry signals with demographic information, SpiroLLM generates comprehensive and clinically relevant diagnostic reports.

If you find SpiroLLM useful for your work, please consider citing our work.

@misc{mei2025spirollmfinetuningpretrainedllms,
      title={SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting}, 
      author={Shuhao Mei and Yongchao Long and Shan Cao and Xiaobo Han and Shijia Geng and Jinbo Sun and Yuxi Zhou and Shenda Hong},
      year={2025},
      eprint={2507.16145},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.16145}, 
}

Quickstart

1. Setup Environment

First, create and activate a Conda virtual environment, then install the required dependencies.

# Create and activate the environment
conda create -n SpiroLLM python=3.11 -y
conda activate SpiroLLM

# Install all dependencies
pip install -r requirements.txt

2. Prepare Demo Data

Run the provided script to automatically download the example spirometry data from the UK Biobank website. The data will be saved to the data/ directory.

python generate_ukbb_demo_data.py

3. Run Inference

Once the environment is set up and the data is downloaded, run the main inference script with the patient's information.

python main.py \
    --csv_path ./data/example.csv \
    --age 69 \
    --sex Male \
    --height_cm 176.0 \
    --is_smoker

The generated report will be printed to the console and saved to the output file specified in your config.yaml.

System Requirements

Python: 3.11
PyTorch: >= 2.0
GPU: A CUDA-enabled GPU with at least 16 GB of VRAM is required for the model to run properly.

Usage

The main.py script is the primary entry point for running inference. It requires the following command-line arguments:

Argument	Type	Description	Required
`--csv_path`	`str`	Path to the patient's raw spirometry data file.	Yes
`--age`	`int`	The age of the patient in years.	Yes
`--sex`	`str`	The sex of the patient (`Male` or `Female`).	Yes
`--height_cm`	`float`	The height of the patient in centimeters.	Yes
`--is_smoker`	`flag`	Include this flag if the patient is a smoker.	No
`--ethnicity`	`str`	Patient's ethnicity. Defaults to `Caucasian`.	No
`--config`	`str`	Path to the configuration YAML file.	No

Data Source

The data used in this project is sourced from the UK Biobank, a large-scale biomedical database and research resource. Access to the data is available to approved researchers upon application. For more information, please visit the UK Biobank website.

Relation to Prior Work

The DeepSpiro feature extractor, a key component of this project, is based on our prior work published in npj systems biology and applications:

Mei S, Li X, Zhou Y, et al. Deep learning for detecting and early predicting chronic obstructive pulmonary disease from spirogram time series[J]. npj Systems Biology and Applications, 2025, 11(1): 18.

The original implementation is available at the COPD-Early-Prediction GitHub repository.

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ or see the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
figs		figs
models		models
prompts		prompts
utils		utils
weights		weights
LICENSE		LICENSE
README.md		README.md
generate_ukbb_demo_data.py		generate_ukbb_demo_data.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting

Introduction

Quickstart

1. Setup Environment

2. Prepare Demo Data

3. Run Inference

System Requirements

Usage

Data Source

Relation to Prior Work

License

About

Uh oh!

Languages

License

yudaleng/SpiroLLM

Folders and files

Latest commit

History

Repository files navigation

SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting

Introduction

Quickstart

1. Setup Environment

2. Prepare Demo Data

3. Run Inference

System Requirements

Usage

Data Source

Relation to Prior Work

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages