CORE-BEHRT is an advanced framework for analyzing Electronic Health Records (EHR) using a BERT-like model optimized for healthcare data. This project aims to provide a robust, reproducible, and state-of-the-art solution for EHR data analysis.
- Efficient data preprocessing pipeline
- Customizable BEHRT model architecture
- Comprehensive training and evaluation scripts
- Cross-validation support for robust model assessment
- Visualization tools for result interpretation
Ensure you have the following dependencies installed:
- Python 3.7+
- PyTorch 1.7+
- transformers 4.0+
- NumPy
- Pandas
- scikit-learn
- tqdm
- matplotlib
- pyarrow (for Parquet file support)
-
Clone the repository:
git clone https://github.com/your-username/core-behrt.git cd core-behrt
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the package:
pip install -e .
Follow these steps to preprocess your data, train the model, and evaluate the results:
-
Data Preparation:
python -m ehr2vec.scripts.main_create_data
-
Model Pre-training:
python -m ehr2vec.scripts.main_pretrain
-
Prepare Fine-tuning Data:
python -m ehr2vec.scripts.main_create_outcomes
-
Model Fine-tuning:
python -m ehr2vec.scripts.main_finetune_cv
To use a custom configuration, pass the path to your config file: