The Diabetes prediction dataset is a collection of medical and demographic data from patients, along with their diabetes status (positive or negative). The data includes features such as age, gender, body mass index (BMI), hypertension, heart disease, smoking history, HbA1c level, and blood glucose level. This dataset is used to build machine learning models to predict diabetes status in patients based on their medical history and demographic information. This can be useful for healthcare professionals in identifying patients who may be at risk of developing diabetes and in developing personalized treatment plans. Additionally, the dataset can be used by researchers to explore the relationships between various medical and demographic factors and the likelihood of developing diabetes.
Diabetes is a global epidemic and a major public health concern. It affects millions of people worldwide, leading to serious complications if not managed properly. Developing a predictive model can help identify individuals at high risk of developing diabetes, enabling early intervention and implementing preventive measures. A diabetes prediction project serves as a showcase of the potential of artificial intelligence and machine learning in healthcare. It demonstrates how these technologies can be harnessed to improve medical outcomes and contribute to evidence-based decision-making.
Kaggle: https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset
Column description:
-
gender: Gender refers to the biological sex of the individual, which can have an impact on their susceptibility to diabetes. There are three categories in it male ,female and other. age: Age is an important factor as diabetes is more commonly diagnosed in older adults.Age ranges from 0-80 in our dataset.
-
hypertension: Hypertension is a medical condition in which the blood pressure in the arteries is persistently elevated. It has values a 0 or 1 where 0 indicates they don’t have hypertension and for 1 it means they have hypertension.
-
heart_diesease: Heart disease is another medical condition that is associated with an increased risk of developing diabetes. It has values a 0 or 1 where 0 indicates they don’t have heart disease and for 1 it means they have heart disease.
-
smoking_history: Smoking history is also considered a risk factor for diabetes and can exacerbate the complications associated with diabetes.In our dataset we have 5 categories i.e not current,former,No Info,current,never and ever.
-
bmi: BMI (Body Mass Index) is a measure of body fat based on weight and height. Higher BMI values are linked to a higher risk of diabetes. The range of BMI in the dataset is from 10.16 to 71.55. BMI less than 18.5 is underweight, 18.5-24.9 is normal, 25-29.9 is overweight, and 30 or more is obese.
-
HbA1c_level: HbA1c (Hemoglobin A1c) level is a measure of a person's average blood sugar level over the past 2-3 months. Higher levels indicate a greater risk of developing diabetes. Mostly more than 6.5% of HbA1c Level indicates diabetes.
-
blood_glucose_level: Blood glucose level refers to the amount of glucose in the bloodstream at a given time. High blood glucose levels are a key indicator of diabetes. diabetes: Diabetes is the target variable being predicted, with values of 1 indicating the presence of diabetes and 0 indicating the absence of diabetes.
- Data Cleaning & Exploration
- Data Visualization
- Descriptive Statistics
- Machine Learning
- Balancing the imbalanced data and Machine Learning
- Results & Conclusion
-
Diabetes count by Smoking History
-
Diabetes count by Heart Disease History
-
Diabetes count by Hypertension History
-
Age Distribution by diabetes status
Jupyter notebook with Anaconda:
https://docs.anaconda.com/ae-notebooks/user-guide/basic-tasks/apps/jupyter/index.html
- Jupyter notebook: Run each cell step by step to understand the process.