This repository contains a full machine learning pipeline and an interactive Streamlit application to predict:
- π©Ί Diagnosis of Thyroid Cancer β Benign or Malignant (Binary Classification)
- π― Risk Level β Low, Medium, or High (Multiclass Classification)
The project is built using clinical and medical data, and designed for both educational and practical demonstration purposes.
Task | Model File | Type | Classes |
---|---|---|---|
Diagnosis Prediction | diagnosis_predictor.pkl |
Binary Classifier | Benign, Malignant |
Risk Prediction | risk_predictor.pkl |
Multiclass Classifier | Low, Medium, High |
These .pkl
files are required to run the Streamlit app and are included in this repository:
File | Description |
---|---|
diagnosis_predictor.pkl |
Trained binary classification model to predict benign/malignant diagnosis |
scaler_diagnosis.pkl |
StandardScaler used to scale numeric input features for diagnosis model |
feature_columns_diagnosis.pkl |
List of feature names used for diagnosis prediction inputs in the app |
binning_edges_diagnosis.pkl |
Binning thresholds for numerical features used in Streamlit select boxes |
File | Description |
---|---|
risk_predictor.pkl |
Trained multiclass model to predict thyroid cancer risk (Low/Medium/High) |
scaler_risk.pkl |
StandardScaler used for numeric inputs of the risk model |
feature_columns_risk.pkl |
List of features used in risk prediction input |
binning_edges_risk.pkl |
Binning thresholds used for categorical-like dropdowns in the app |
The models were trained on a clinical dataset containing patient health parameters and diagnostic labels.
If you'd like to train the models yourself, you can download the dataset from:
π []Kaggle Dataset Link :] (https://www.kaggle.com/datasets/bhargavchirumamilla/thyroid-cancer-risk-dataset)
- Python 3.10+
- Streamlit β for building interactive web apps
- Scikit-learn β machine learning models and preprocessing
- Pandas, NumPy β data analysis and manipulation
- Joblib β efficient model serialization
- Google Colab β preprocessing and model training
This project is open-source and free to use for non-commercial and academic purposes.
If you use this work in research or development, please provide attribution.
- Kaggle: Thyroid Disease Dataset β for providing the publicly available clinical data
- The open-source contributors to Streamlit, Scikit-learn, and XGBoost
- The broader Python and machine learning communities for open knowledge sharing