Description

The World Health Organization (WHO) characterized the COVID-19, caused by the SARS-CoV-2, as a pandemic on March 11, while the exponential increase in the number of cases was risking to overwhelm health systems around the world with a demand for ICU beds far above the existing capacity, with regions of Italy being prominent examples.

Brazil recorded the first case of SARS-CoV-2 on February 26, and the virus transmission evolved from imported cases only, to local and finally community transmission very rapidly, with the federal government declaring nationwide community transmission on March 20.

Until March 27, the state of São Paulo had recorded 1,223 confirmed cases of COVID-19, with 68 related deaths, while the county of São Paulo, with a population of approximately 12 million people and where Hospital Israelita Albert Einstein is located, had 477 confirmed cases and 30 associated death, as of March 23. Both the state and the county of São Paulo decided to establish quarantine and social distancing measures, that will be enforced at least until early April, in an effort to slow the virus spread.

One of the motivations for this challenge is the fact that in the context of an overwhelmed health system with the possible limitation to perform tests for the detection of SARS-CoV-2, testing every case would be impractical and tests results could be delayed even if only a target subpopulation would be tested.

Dataset

This dataset contains anonymized data from patients seen at the Hospital Israelita Albert Einstein, at São Paulo, Brazil, and who had samples collected to perform the SARS-CoV-2 RT-PCR and additional laboratory tests during a visit to the hospital.

All data were anonymized following the best international practices and recommendations. All clinical data were standardized to have a mean of zero and a unit standard deviation.

Data fields

Hematocrit,Leukocytes,Potassium etc.
SARS-Cov-2 exam result - target; 1 for positive and 0 for negative test

Data Augmentation

SMOTE

this technique, minority class data are synthetically over-sampled, presenting for the training subset the same proportion of instances for the positive and the negative class.

Training and Testing Models

Let us try the following models:

Random Forest
XGBoost
Logistic Regression

Scoring function will be F1, since it is more costly to have false negatives than false positives

Nested K-Fold Cross Validation

Model Explanation using SHAP

⚠️ Prerequisites

📦 How To Install

You can modify or contribute to this project by following the steps below:

1. Clone the repository

Open terminal ( Ctrl + Alt + T )
Clone to a location on your machine.

# Clone the repository 
$> git clone https://github.com/serfati/covid_bloodtest.git  

# Navigate to the directory 
$> cd covid_bloodtest

2. Install Dependencies

# install with pip/conda 
$> pip install -r requirments.txt

3. launch of the project

# Run nootebook 
$> jupyter notebook AML.ipynb

Or open with Colab

author Serfati

⚖️ License

This program is free software: you can redistribute it and/or modify it under the terms of the MIT LICENSE as published by the Free Software Foundation.

⬆ back to top

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
catboost_info		catboost_info
.gitignore		.gitignore
AML.ipynb		AML.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Dataset

Data fields

Data Augmentation

Training and Testing Models

Nested K-Fold Cross Validation

Model Explanation using SHAP

⚠️ Prerequisites

📦 How To Install

⚖️ License

About

Releases

Packages

Languages

Serfati/covid_bloodtest

Folders and files

Latest commit

History

Repository files navigation

Description

Dataset

Data fields

Data Augmentation

Training and Testing Models

Nested K-Fold Cross Validation

Model Explanation using SHAP

⚠️ Prerequisites

📦 How To Install

⚖️ License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages