Skip to content

kirilklein/corebehrt_phair

Repository files navigation

CORE-BEHRT: Carefully Optimized and Rigorously Evaluated BEHRT

Formatting using black Lint using flake8 Pipeline test Unittests

Overview

CORE-BEHRT is an advanced framework for analyzing Electronic Health Records (EHR) using a BERT-like model optimized for healthcare data. This project aims to provide a robust, reproducible, and state-of-the-art solution for EHR data analysis.

Features

  • Efficient data preprocessing pipeline
  • Customizable BEHRT model architecture
  • Comprehensive training and evaluation scripts
  • Cross-validation support for robust model assessment
  • Visualization tools for result interpretation

Prerequisites

Ensure you have the following dependencies installed:

  • Python 3.7+
  • PyTorch 1.7+
  • transformers 4.0+
  • NumPy
  • Pandas
  • scikit-learn
  • tqdm
  • matplotlib
  • pyarrow (for Parquet file support)

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/core-behrt.git
    cd core-behrt
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install the package:

    pip install -e .

Usage

Follow these steps to preprocess your data, train the model, and evaluate the results:

  1. Data Preparation:

    python -m ehr2vec.scripts.main_create_data
  2. Model Pre-training:

    python -m ehr2vec.scripts.main_pretrain
  3. Prepare Fine-tuning Data:

    python -m ehr2vec.scripts.main_create_outcomes
  4. Model Fine-tuning:

    python -m ehr2vec.scripts.main_finetune_cv

To use a custom configuration, pass the path to your config file:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published