Intrusion detection using the 1999 KDD Cup dataset. kddcup.data.gz is used for training/validation and corrected.gz is used as the test set, which includes attack types which are not in the training data to more closely align with the challenges of intrusion detection in practice.
get_data.shdownloads the data (data not included in repository).explore_data.ipynbcontains some exploratory work, most importantly finding the presence of duplicated rows, a column whose value was constant and not valuable in this task, and class imbalance (as expected with intrusion detection).base.pycontains theModelHelperclass, which handles routines such as preparing training and test data, training, hyperparameter-tuning, model loading & saving, and model evaluation.kdd_env.ymlcontains the dependencies for this project.