Skip to content

leahdsouza/DiabetesML-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

DiabetesML-Classifier

This repository contains an analysis and machine learning project focused on diagnosing diabetes in Pima Indian women based on various medical attributes. The dataset used in this project is sourced from the National Institute of Diabetes and Digestive and Kidney Diseases and is available on Kaggle.

Project Overview

The project involves the following steps:

  • Data Exploration: Statistical analysis and visualization of the dataset to understand the distribution of features and identify any necessary preprocessing steps.
  • Correlation Analysis: Computing Pearson Correlation Coefficients (PCC) and generating scatter plots to analyze relationships between features and the target variable.
  • Model Training: Training various classifiers, including Multinomial Logistic Regression, Support Vector Machines, and Random Forest, while tuning hyperparameters to improve performance.
  • Model Evaluation: Evaluating models based on training, validation, and testing datasets, and comparing performance metrics such as accuracy, precision, recall, and F1 score.
  • Ensemble Method: Combining different classifiers into an ensemble to enhance performance on the validation set, and testing the ensemble on the test set.

Dataset Used

The dataset includes several medical predictor variables such as the number of pregnancies, BMI, insulin levels, age, and more. The target variable is the outcome, indicating whether a patient has diabetes.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published