🧠 Stroke Prediction Using Data Mining

📖 Project Overview

Stroke Prediction Using Data Mining is a machine learning project that aims to build a predictive model to classify whether an individual is likely to suffer a stroke based on their healthcare and demographic data.

This project involved:

Comprehensive data preprocessing and EDA
Implementation of 5 ML models (Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors, SVM)
Usage of techniques like SMOTE (for class imbalance) and Recursive Feature Elimination (RFE)
Creation of an advanced Stacked Ensemble Model for improved predictive accuracy

📊 Dataset

Source: Kaggle - Stroke Prediction Dataset
Records: 5,110 instances
Features:
- Categorical: Gender, Ever Married, Work Type, Residence Type, Smoking Status
- Numerical: Age, Hypertension, Heart Disease, Avg Glucose Level, BMI

Note: BMI missing values were handled via mean imputation.

🛠️ Techniques Used

Technique	Purpose
Label Encoding	Convert categorical variables
SMOTE	Handle class imbalance
Standard Scaler	Normalize numerical features
Recursive Feature Elimination (RFE)	Feature selection
Ensemble Learning	Boost accuracy with multiple models

📈 Models Evaluated

Model	Accuracy	F1 Score	ROC AUC
Logistic Regression	0.79	0.80	0.85
Decision Tree	0.91	0.91	0.91
Random Forest	0.96	0.96	0.99
K-Nearest Neighbors (KNN)	0.90	0.90	0.95
Support Vector Machines (SVM)	0.84	0.85	0.91

🚀 Best Model: Random Forest
🚀 Even Better: Stacked Ensemble achieved 96.66% accuracy!

📊 Additional Observations

Tuned SVM improved accuracy and ROC AUC through hyperparameter optimization.
Stacked Ensemble combined base models and gave the best generalization.

🔥 Key Takeaways

Data Preprocessing and Feature Engineering significantly affect performance.
SMOTE improved stroke class detection.
Ensemble models (Random Forest, Stacked) outperform individual models.
Feature Selection (RFE) simplifies models with minor performance trade-offs.

📚 References

📂 Dataset: Stroke Prediction Dataset - Kaggle
📑 Project Notebook: Google Colab Notebook

📬 Contact

👤 Hetu Virajkumar Patel | GitHub | LinkedIn
👤 Nilay Thakorbhai Patel

🎯 Developed as part of CPS 844 - Data Mining, under Prof. Cherie Ding at Toronto Metropolitan University.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
cps844_script_hetu.py		cps844_script_hetu.py
cps844w25_hetu.pdf		cps844w25_hetu.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Stroke Prediction Using Data Mining

📖 Project Overview

📊 Dataset

🛠️ Techniques Used

📈 Models Evaluated

📊 Additional Observations

🔥 Key Takeaways

📚 References

📬 Contact

About

Releases

Packages

Languages

hetuvpatel/brain-stroke-prediction

Folders and files

Latest commit

History

Repository files navigation

🧠 Stroke Prediction Using Data Mining

📖 Project Overview

📊 Dataset

🛠️ Techniques Used

📈 Models Evaluated

📊 Additional Observations

🔥 Key Takeaways

📚 References

📬 Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages