Gamma-Hadron Separation with Machine Learning
This repository provides a machine learning framework to classify gamma-ray (signal) vs. hadronic (background) events from the MAGIC dataset, emphasizing low false-positive rate (FPR) performance. It supports multiple models, feature-processing variants, and automatic evaluation with confusion matrices and ROC-based metrics.
Gamma–hadron separation is a key task in ground-based Cherenkov telescope analysis. Simple accuracy is insufficient — classifying a background event as signal is far worse than misclassifying signal as background. Thus, models here are compared using ROC-based metrics, particularly TPR at low FPRs (e.g., 1–10%) and partial AUC (pAUC).
The full pipeline consists of standardized preprocessing, PCA-based feature compression, upsampling for class balance, and low-FPR model evaluation.
-
All original features (
fLength→fDist) are standardized usingStandardScaler. -
Models are trained directly on these standardized features.
-
Evaluation focuses on:
- Partial AUC (pAUC@≤0.10) as the CV selection metric
- TPR at FPR = 0.01, 0.02, 0.05, 0.10, 0.20
- Full AUC, Confusion Matrix, and ROC plots
- Compute Mutual Information (MI) between each feature and the target.
- Retain the top MI feature (
fAlpha) explicitly. - Apply
StandardScalerto the remaining features, then fit PCA to keep components explaining ≈95% of variance. - Concatenate
[fAlpha (scaled)] + [PCA components]to form the final training matrix. - Train the same set of models with identical evaluation metrics.
-
Upsample the minority class in the training data using
sklearn.utils.resample. -
Perform 5-fold Stratified Cross-Validation with RandomizedSearchCV.
-
Optimize models for pAUC@≤0.10.
-
Compute test-set metrics:
- TPR@FPR thresholds (0.01–0.20)
- Partial AUCs and Full AUC
- Confusion Matrix and ROC plots (saved and/or displayed)
-
Generate a summary table ranking all models by CV and test performance.

