Skip to content

fmani/stroke-prediction-xgboost

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Analysis of the stroke prediction dataset employing the XGBoost Classifier

If not available on GitHub, the notebook can be accessed on nbviewer, or alternatively on Kaggle

Analysis of the Stroke Prediction Dataset provided on Kaggle.

Context According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for
approximately 11% of total deaths. This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, > various diseases, and smoking status. Each row in the data provides relavant information about the patient.

In the Jupyter notebook provided here, I perform in the first place feature engineering and data visualization. The dataset is strongly imbalanced since ~95% of the data belongs to one class of the target variable stroke. For this reason, I tested the XGBoost classifier together with some resampling techniques (i.e. oversampling, undersampling and oversampling + undersampling). Due to the imbalanced dataset, I employed the F_1 score to test the models. I selected the SMOTE + ENN algorithm which provided the best results on the test set, and reduced the dimensionality with the help of univariate and multivariate analysis techniques, comparing the results obtained. I selected the SMOTE + ENN algorithm which provided the best results on the test set, and reduced the dimensionality with the help of univariate and multivariate analysis techniques, comparing the results obtained. The best model found (based on the F_1 score) is the XGBoost classifier with SMOTE + ENN, trained with four features only (age, avg_glucose_level, bmi, smoking_status_never smoked).

density

About

Analysis of the Stroke Prediction Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published