Skip to content

SuperDataScience-Community-Projects/SDS-CP035-gluco-track

Repository files navigation

Welcome to the SuperDataScience Community Project!

Welcome to GlucoTrack: Predicting Diabetes Likelihood Using Clinical Data 🎉

This project is a collaborative initiative brought to you by SuperDataScience, a thriving community dedicated to advancing the fields of data science, machine learning, and AI. We’re excited to have you join us on this journey of exploration, modeling, and deployment.

To contribute, please follow the guidelines in our CONTRIBUTING.md file.


📌 Project Overview

GlucoTrack is an end-to-end data science project built on demographic and clinical health data. Participants will analyze patient attributes such as glucose levels, BMI, blood pressure, and age to predict the likelihood of diabetes.

The project is structured into two learning tracks so members can join at their preferred skill level:

  • 🟢 Beginner Track – Feature-based ML pipeline with Streamlit deployment
  • 🔴 Advanced Track – Deep learning classification with explainability and interpretability

Dataset: Diabetes Health Dataset


🟢 Beginner Track

The Beginner Track emphasizes:

  • End-to-end ML workflow using scikit-learn
  • Data preprocessing (handling missing values, normalization, encoding)
  • Training models such as Logistic Regression, Random Forest, and XGBoost
  • Tracking experiments with MLflow
  • Deploying a Streamlit app for interactive predictions

📌 Get started: ➡️ Beginner Track Scope of Works ➡️ Beginner Report Template ➡️ Submit your work


🔴 Advanced Track

The Advanced Track emphasizes:

  • Building deep learning pipelines with PyTorch/TensorFlow
  • Using embeddings, dropout, batch normalization, and regularization
  • Model explainability with SHAP or Integrated Gradients
  • Residual/error analysis for clinical interpretability
  • Deploying a Streamlit app with model predictions and interpretability visuals

📌 Get started: ➡️ Advanced Track Scope of Works ➡️ Advanced Report Template ➡️ Submit your work


⚡ Workflow & Timeline (Both Tracks)

Phase Core Tasks Duration
1 · Setup + EDA Set up repo, clean dataset, explore health indicators, answer key EDA Qs Week 1
2 · Model Development Train and tune ML/DL models, track experiments with MLflow Weeks 2–4
3 · Deployment Build Streamlit app and deploy to Streamlit Cloud Week 5

About

GlucoTrack is a machine learning and deep learning project focused on predicting a person’s risk level of diabetes

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 13