Diabetes Data Analysis and Prediction

The Diabetes prediction dataset is a collection of medical and demographic data from patients, along with their diabetes status (positive or negative). The data includes features such as age, gender, body mass index (BMI), hypertension, heart disease, smoking history, HbA1c level, and blood glucose level. This dataset is used to build machine learning models to predict diabetes status in patients based on their medical history and demographic information. This can be useful for healthcare professionals in identifying patients who may be at risk of developing diabetes and in developing personalized treatment plans. Additionally, the dataset can be used by researchers to explore the relationships between various medical and demographic factors and the likelihood of developing diabetes.

Motivation

Diabetes is a global epidemic and a major public health concern. It affects millions of people worldwide, leading to serious complications if not managed properly. Developing a predictive model can help identify individuals at high risk of developing diabetes, enabling early intervention and implementing preventive measures. A diabetes prediction project serves as a showcase of the potential of artificial intelligence and machine learning in healthcare. It demonstrates how these technologies can be harnessed to improve medical outcomes and contribute to evidence-based decision-making.

Dataset and column description

Kaggle: https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset

Column description:

gender: Gender refers to the biological sex of the individual, which can have an impact on their susceptibility to diabetes. There are three categories in it male ,female and other. age: Age is an important factor as diabetes is more commonly diagnosed in older adults.Age ranges from 0-80 in our dataset.
hypertension: Hypertension is a medical condition in which the blood pressure in the arteries is persistently elevated. It has values a 0 or 1 where 0 indicates they don’t have hypertension and for 1 it means they have hypertension.
heart_diesease: Heart disease is another medical condition that is associated with an increased risk of developing diabetes. It has values a 0 or 1 where 0 indicates they don’t have heart disease and for 1 it means they have heart disease.
smoking_history: Smoking history is also considered a risk factor for diabetes and can exacerbate the complications associated with diabetes.In our dataset we have 5 categories i.e not current,former,No Info,current,never and ever.
bmi: BMI (Body Mass Index) is a measure of body fat based on weight and height. Higher BMI values are linked to a higher risk of diabetes. The range of BMI in the dataset is from 10.16 to 71.55. BMI less than 18.5 is underweight, 18.5-24.9 is normal, 25-29.9 is overweight, and 30 or more is obese.
HbA1c_level: HbA1c (Hemoglobin A1c) level is a measure of a person's average blood sugar level over the past 2-3 months. Higher levels indicate a greater risk of developing diabetes. Mostly more than 6.5% of HbA1c Level indicates diabetes.
blood_glucose_level: Blood glucose level refers to the amount of glucose in the bloodstream at a given time. High blood glucose levels are a key indicator of diabetes. diabetes: Diabetes is the target variable being predicted, with values of 1 indicating the presence of diabetes and 0 indicating the absence of diabetes.

Steps involved in this project

Data Cleaning & Exploration
Data Visualization
Descriptive Statistics
Machine Learning
Balancing the imbalanced data and Machine Learning
Results & Conclusion

Screenshots

Diabetes count by Smoking History
Diabetes count by Heart Disease History
Diabetes count by Hypertension History
Age Distribution by diabetes status

Tableau

https://public.tableau.com/app/profile/vrushali.kulkarni5437/viz/DiabeticsDataAnalysis/DiabetescountbyHeartdisease?publish=yes

Installation

Jupyter notebook with Anaconda:

https://docs.anaconda.com/ae-notebooks/user-guide/basic-tasks/apps/jupyter/index.html

How to use?

Jupyter notebook: Run each cell step by step to understand the process.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Pickle		Pickle
data		data
results		results
Cleaning.ipynb		Cleaning.ipynb
EDA.ipynb		EDA.ipynb
Imbalanced_Data&Machine_Learning.ipynb		Imbalanced_Data&Machine_Learning.ipynb
Machine_Learning.ipynb		Machine_Learning.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diabetes Data Analysis and Prediction

Motivation

Dataset and column description

Steps involved in this project

Screenshots

Tableau

Installation

How to use?

About

Releases

Packages

Languages

vrushali92/Diabetes-Data-Analysis-and-Prediction

Folders and files

Latest commit

History

Repository files navigation

Diabetes Data Analysis and Prediction

Motivation

Dataset and column description

Steps involved in this project

Screenshots

Tableau

Installation

How to use?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages