forked from sergts/botnet-traffic-analysis
-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
archiveRelated to archiving old research codeRelated to archiving old research codetechnical-debtTechnical debt and code qualityTechnical debt and code quality
Description
Problem
The scaler is being fit on both training AND validation data, which is a form of data leakage.
Affected files:
anomaly-detection/train_og.py:29anomaly-detection/test.py:40
Current code:
scaler.fit(x_train.append(x_opt))Issue: The scaler learns mean/std statistics from validation data (x_opt) that it shouldn't have access to during training. This inflates accuracy metrics.
Correct approach:
scaler.fit(x_train) # Only fit on training dataImpact
This is likely the cause of the suspected overtraining. The reported 99.98% accuracy may be artificially inflated.
Priority
CRITICAL - This affects the validity of published results.
References
- Archive branch: Lines identified in code review
- See: RETROSPECTIVE.md for context
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
archiveRelated to archiving old research codeRelated to archiving old research codetechnical-debtTechnical debt and code qualityTechnical debt and code quality