The official repository for code of paper "EHRAgent: Code Empowers Large Language Models for Complex Tabular Reasoning on Electronic Health Records". EHRAgent is an LLM agent empowered with a code interface, to autonomously generate and execute code for complex clinical tasks within electronic health records (EHRs).
- EHRAgent is an LLM agent augmented with tools and medical knowledge, to solve complex tabular reasoning derived from EHRs;
- Planning with a code interface, EHRAgent enables the LLM agent to formulate a clinical problem-solving process as an executable code plan of action sequences, along with a code executor;
- We introduce interactive coding between the LLM agent and code executor, iteratively refining plan generation and optimizing code execution by examining environment feedback in depth.
We use the EHRSQL benchmark for evaluation. The original dataset is for text-to-SQL task, and we have made adaptations on our evaluation. We release our clean and pre-processed version of EHRSQL-EHRAgent data. Please download the data and record the path of the data.
Our experiments are based on OpenAI API services. Please record your API keys and other credentials in the ./ehragent/config.py.
See requirements.txt. Packages with versions specified in requirements.txt are used to test the code. Other versions are not fully tested by may also work. We also kindly suggest the users to run this code with Python version: python>=3.9. Install required libraries with the following command:
pip3 install -r requirements.txtThe outputting results will be saved under directory ./logs/. Use the following command to run our code:
python main.py --llm YOUR_LLM_NAME --dataset mimic_iii --data_path YOUR_DATA_PATH --logs_path YOUR_LOGS_PATH --num_questions -1 --seed 0We also support debugging mode to focus on single question:
python main.py --llm YOUR_LLM_NAME --dataset mimic_iii --data_path YOUR_DATA_PATH --logs_path YOUR_LOGS_PATH --debug --debug_id QUESTION_ID_TO_DEBUGFor eICU dataset, just change the option of dataset to --dataset eicu.
If you find this repository useful, please consider citing:
@misc{shi2024ehragent,
title={EHRAgent: Code Empowers Large Language Models for Complex Tabular Reasoning on Electronic Health Records},
author={Wenqi Shi and Ran Xu and Yuchen Zhuang and Yue Yu and Jieyu Zhang and Hang Wu and Yuanda Zhu and Joyce Ho and Carl Yang and May D. Wang},
year={2024},
eprint={2401.07128},
archivePrefix={arXiv},
primaryClass={cs.CL}
}