AIPAL Validator is a tool designed to streamline the validation process for AIPAL. Below you'll find instructions on how to set up and run this validator both locally and with Docker.
-
R Installation: Ensure R is installed on your system. If not, install it using:
sudo apt-get install r-base
Also ensure to install the following packages within R: 'dplyr', 'tidyr', 'yaml', 'caret', 'xgboost'
-
Install the necessary dependencies:
poetry install
-
Run the validation process. You can specify the step to run (all, data, sampling, test):
poetry run aipal_validation --task aipal --step [all,data,sampling,test]
-
Run the Docker container:
docker compose run aipal bash
-
Inside the Docker container, execute the validation script:
python -m aipal_validation --task aipal --step [all,data,sampling,test]
The project has the following structure:
aipal_validation/: Main package containing all functionalityr/: Contains all R scripts for prediction and model training (moved from root directory)config/: Configuration filesdata_preprocessing/: Data preprocessing moduleseval/: Evaluation modulesfhir/: FHIR-related modulesml/: Machine learning modulesoutlier/: Outlier detection moduleshelper/: Utility functions
If you don't have a Firemetrics server running and want to import data from an Excel sheet, follow these steps:
-
Set the
run_id:- Update the
run_idto match your cohort name.
- Update the
-
Prepare Your Directory:
- In your
root_dir, create a folder named after your cohort. - Inside this folder, create another folder named
aipal. - Place your Excel sheet in the
aipalfolder.
- In your
-
Generate Custom Samples:
- Run the following command:
python -m aipal_validation --task aipal --step sampling
- This command invokes the
generate_custom_samples.pyclass. - Ensure the column names in your Excel file exactly match the expected names in the script.
- Alternatively, perform necessary data transformations within the script.
- Run the following command:
-
Run the Validation Pipeline:
- Once the
samples.csvfile is successfully created, execute the following command to run the validation pipeline:python -m aipal_validation --task aipal --step test
- Once the
To run outlier detection on your dataset and identify potential anomalies:
-
Local Setup:
poetry run aipal_validation --task outlier --step detect
-
Docker Setup:
docker compose run aipal bash python -m aipal_validation --task outlier --step detect
The outlier detection uses isolation forest and local outlier factor (LOF) algorithms to identify samples that deviate significantly from the expected patterns in each class.
To retrain the AIPAL model with your dataset:
-
Local Setup:
poetry run aipal_validation --task retrain --step all
-
Docker Setup:
docker compose run aipal bash python -m aipal_validation --task retrain --step all
The retraining process will:
- Split your data into training and testing sets
- Train an XGBoost model on the pediatric subset (age < 18)
- Save the retrained model and prediction outputs to the
aipal_validation/r/directory - Perform evaluation on the test set