Skip to content

In this repo, we analyze a dataset of heart patient metrics to build a model identifying heart disease risks. We prioritize high recall for comprehensive detection through EDA, preprocessing, and model building. Explore our approach and findings!

License

Notifications You must be signed in to change notification settings

FarzadNekouee/Heart_Disease_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Heart Disease Prediction and Analysis

Heart Disease Prediction

Overview

This repository contains a project focused on heart disease prediction. The data, derived from heart patients, includes various health metrics such as age, blood pressure, heart rate, and more. The primary objective is to create a predictive model that accurately identifies individuals at risk of heart disease. The emphasis is on achieving a high recall to ensure no potential heart disease case is missed.

Problem

In this project, we delve into a dataset encapsulating various health metrics from heart patients, including age, blood pressure, heart rate, and more. Our goal is to develop a predictive model capable of accurately identifying individuals with heart disease. Given the grave implications of missing a positive diagnosis, our primary emphasis is on ensuring that the model identifies all potential patients, making recall for the positive class a crucial metric.

Objectives

The objectives of the project are as follows:

  1. Data Understanding: Familiarize ourselves with the dataset and its features.
  2. Exploratory Data Analysis (EDA): Unveil patterns, trends, and relationships between different variables.
    • Univariate Analysis
    • Bivariate Analysis
  3. Data Preprocessing: Prepare the data for future machine learning tasks.
    • Remove irrelevant features
    • Address missing values
    • Treat outliers
    • Encode categorical variables
    • Transform skewed features to achieve normal-like distributions
  4. Model Building: Develop and refine the prediction models.
    • Establish pipelines for models that require scaling
    • Implement and tune classification models including KNN, SVM, Decision Tree, and Random Forest
    • Emphasize achieving high recall for class 1, ensuring comprehensive identification of heart patients
  5. Evaluate and Compare Model Performance: Utilize precision, recall, and F1-score to gauge models' effectiveness.

Dataset

The dataset comprises various metrics related to heart health. The features of the dataset are described in the table below:

Variable Name Description
age Age of the patient in years
sex Gender of the patient (0 = male, 1 = female)
cp Chest pain type:
0: Typical angina
1: Atypical angina
2: Non-anginal pain
3: Asymptomatic
trestbps Resting blood pressure in mm Hg
chol Serum cholesterol in mg/dl
fbs Fasting blood sugar level, categorized as above 120 mg/dl (1 = true, 0 = false)
restecg Resting electrocardiographic results:
0: Normal
1: Having ST-T wave abnormality
2: Showing probable or definite left ventricular hypertrophy
thalach Maximum heart rate achieved during a stress test
exang Exercise-induced angina (1 = yes, 0 = no)
oldpeak ST depression induced by exercise relative to rest
slope Slope of the peak exercise ST segment:
0: Upsloping
1: Flat
2: Downsloping
ca Number of major vessels (0-4) colored by fluoroscopy
thal Thalium stress test result:
0: Normal
1: Fixed defect
2: Reversible defect
3: Not described
target Heart disease status (0 = no disease, 1 = presence of disease)

You can find the dataset here.

File Descriptions

  • Heart Disease Prediction.ipynb: Jupyter notebook containing all the data exploration, visualization, modeling, and evaluation code.
  • heart.csv: CSV file containing the heart disease data.
  • README.md: This file, providing an overview of the project.

How to Run

  • Clone this repository.
  • Open the Heart Disease Prediction.ipynb notebook in Jupyter.
  • Run all cells in the notebook.

Additional Resources

For those interested in exploring this notebook in a Kaggle environment, you can access it here.

About

In this repo, we analyze a dataset of heart patient metrics to build a model identifying heart disease risks. We prioritize high recall for comprehensive detection through EDA, preprocessing, and model building. Explore our approach and findings!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published