Breast Cancer Prediction with Machine Learning

This project demonstrates an end-to-end machine learning pipeline for predicting breast cancer using the Breast Cancer Wisconsin Diagnostic Dataset. The Logistic Regression model classifies tumors as malignant (cancerous) or benign (non-cancerous), showcasing the full ML workflow from data cleaning to model evaluation.

Note: This project is for educational and demonstration purposes only. It must not be used for clinical decision-making.

Jupyter Notebook

For a full walkthrough with code, outputs, and visualizations, see the Jupyter Notebook BreastCancerPredictAI.ipynb

Run the notebook online (no setup required):

Project Workflow

The pipeline follows these steps:

Data loading & cleaning
Exploratory Data Analysis (EDA)
- Class distribution of malignant vs benign
- Feature distributions and correlations
Preprocessing
- Scaling
- Train/test split with stratification
Model Training
- Logistic Regression with GridSearchCV for hyperparameter tuning
- Pipelines with ColumnTransformer, StandardScaler
Threshold Selection for Recall
- Precision-recall analysis
- Cross-validation to pick a threshold ensuring ~99% recall
- Prioritizes recall to minimize false negatives in diagnosis context
Model Evaluation
- Metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC
- Confusion matrix, precision-recall curves
Saving Model & Threshold
- Best model saved with joblib
- Optimal threshold stored in JSON for reproducibility

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
BreastCancerPredictAI.ipynb		BreastCancerPredictAI.ipynb
BreastCancerPredictAI.py		BreastCancerPredictAI.py
README.md		README.md
best_model.pkl		best_model.pkl
breast_cancer_data.csv		breast_cancer_data.csv
breast_cancer_data_cleaned.csv		breast_cancer_data_cleaned.csv
chosen_threshold.json		chosen_threshold.json
correlation_between_features.jpg		correlation_between_features.jpg
feature_distribution_eda.jpg		feature_distribution_eda.jpg
some-links-on-breast-cancer-modeling.md		some-links-on-breast-cancer-modeling.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Breast Cancer Prediction with Machine Learning

Jupyter Notebook

Project Workflow

About

Uh oh!

Languages

MahbubAlam231/BreastCancerPredictAI

Folders and files

Latest commit

History

Repository files navigation

Breast Cancer Prediction with Machine Learning

Jupyter Notebook

Project Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages