Predict diabetes using classic ML models on the Pima Indians Diabetes dataset (or a compatible healthcare CSV).
.
├─ README.md
├─ requirements.txt
├─ app.py # Streamlit app for inference
├─ .gitignore
├─ LICENSE
├─ diabetes_capstone.ipynb
├─ diabetes_best_model.joblib # created after running notebook
└─ data/
└─ README.md # how to obtain/place datasets
- Preferred: Kaggle Pima Indians Diabetes: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
- Or use a compatible CSV placed in
Downloads(default search list in the notebook):diabetes.csvDisease_symptom_and_patient_profile_dataset.csvhealthcare_dataset.csv
Target column should be one of: Outcome, target, diabetes, or class. Adjust in the notebook if different.
- Create environment and install dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt- Launch the notebook
jupyter notebook diabetes_capstone.ipynb- Run all cells
- Performs EDA, preprocessing, trains Logistic Regression / RandomForest / SVM / GradientBoosting
- Compares metrics and saves best model as
diabetes_best_model.joblib
- Export to PDF (for submission)
- From the notebook UI: File → Print Preview → Print to PDF
- Or via CLI:
jupyter nbconvert --to pdf diabetes_capstone.ipynb- All steps are captured in the notebook.
requirements.txtpins core packages for consistent runs.
MIT License — see LICENSE.
Run locally:
pip install -r requirements.txt
streamlit run app.pyUsage:
- Option 1: Upload a CSV with the same feature columns used during training.
- Option 2: Use the manual input form (typical numeric features from Pima dataset).
Deployment (optional):
- Push this repo to GitHub. On Streamlit Cloud (or similar), create a new app pointing to
app.py.