This project applies supervised machine learning techniques to predict breast cancer tumors as benign or malignant. We use Support Vector Machines (SVM) and K-Nearest Neighbors (KNN) to classify tumors based on key cellular features. The project also compares model performance and identifies the most influential features.
- José Pablo Del Moral
- Pol Tordera
- Ferran Serramalera
- Bruno Pin
- Oscar Grau
The dataset contains measurements of cell nuclei from breast cancer biopsies, including features like radius, texture, perimeter, area, smoothness, compactness, concavity, symmetry, and fractal dimension.
- Total entries: 569
- Target:
diagnosis(0 = benign, 1 = malignant)
- Data Cleaning & Preprocessing: Removed irrelevant columns and encoded target labels.
- Exploratory Data Analysis (EDA): Visualized distributions and correlations, identified top predictive features.
- Modeling:
- SVM: GridSearchCV to tune hyperparameters; achieved ~95.6% accuracy.
- KNN: GridSearchCV to optimize neighbors and distance metrics; achieved ~96.5% accuracy.
- Evaluation: Accuracy, precision, recall, f1-score, and confusion matrices used to compare models.
- Most important features:
concave points_worst,perimeter_worst,concave points_mean. - KNN slightly outperformed SVM in minimizing false negatives, making it preferable for this medical dataset.
- Features like symmetry and texture had minimal impact on predictions.
Both SVM and KNN provide high-accuracy predictions, but KNN is more effective for reducing false negatives. This demonstrates the practical use of supervised techniques in medical diagnostics.
- Clone this repository.
- Load
data.csvinto the Jupyter NotebookBreast_Cancer_Supervised.ipynb. - Execute all cells to replicate preprocessing, modeling, and evaluation.
This project is for educational purposes.