This repository contains the code and analysis pipeline for Hanyang Li, Xiao He, Adarsh Subbaswamy, Patrick Vossler, Alexej Gossmann, Karandeep Singh & Jean Feng. Scaling medical device regulatory science using large language models. npj Digital Medicine (2026). https://doi.org/10.1038/s41746-026-02353-7
This work develops and validates an LLM-based pipeline for scaling data analyses in medical device regulatory science. We demonstrate how LLMs can accurately extract structured information from complex, unstructured FDA regulatory documents in three case studies:
- Device Validation Practices: What validation practices are reported for FDA-cleared/approved AI/ML medical devices?
- Medical Device Report (MDR) Coding: Can LLMs assist/improve the accuracy of codes assigned to MDRs?
- Pre-Market Risk Factors: Can we identify device characteristics during FDA clearance that are associated with post-market MDRs?
-
data/: FDA reference datasetsFDA-CDRH_NCIt_Subsets.csv- FDA medical device terminology/classification data- Other relevant datasets can be downloaded using the provided scripts
-
scripts/: Core analysis pipelinecommon.py- Shared utilities for data processing and analysisdownload_device_pdfs.py- FDA API integration for downloading device summaries- Each case study corresponds to a folder:
- Device Validation Practices:
scripts/analysis_validation/ - MDR coding:
scripts/analysis_ae_recall/ - Pre-Market Risk Factors:
scripts/analysis_pre_post_associations/
- Device Validation Practices:
-
scripts/utils/: Utility modulesgpt_utils.py- OpenAI API integration with cost tracking for multiple GPT modelspdf_utils.py- PDF text extraction with OCR fallback for poor quality documentsextract_primary_predicate.py- Regex-based extraction of device predicate information
pip install -r requirements.txtEach analysis workflow can be executed by running run_pipeline.sh within its respective folder
Create a .env file with your API credentials:
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
If you use this code or methodology in your research, please cite:
@article{li2026scaling,
title={Scaling medical device regulatory science using large language models},
author={Li, Hanyang and He, Xiao and Subbaswamy, Adarsh and Vossler, Patrick and Gossmann, Alexej and Singh, Karandeep and Feng, Jean},
journal={npj Digital Medicine},
year={2026},
doi={10.1038/s41746-026-02353-7},
url={https://doi.org/10.1038/s41746-026-02353-7}
}