==============================
Optimized hyperparameters can be found for the non-contrastive SSL models in best_config.yml, and the optimized hyperparameters for the two baselines are stored in best_config_baselines.yml
git clone https://github.com/renje4z335jh4/non_contrastive_SSL_NIDS.git
conda create --name [env_name] python=3.8
conda activate [env_name]
cd non_contrastive_SSL_NIDS
pip install -e .
You can download the Combined.zip
here
Please extract the csv-file in 'data/raw/5GNIDD/'
UNSW_NB15_Test.csv
: here
UNSW_NB15_Train.csv
: here
Please extract the csv-file in 'data/raw/UNSW-NB15/'
The process script can be used as follows:
python src/data/process.py [data_set] -d [path/to/dir/containing/the/CSV/files] -o [path/to/output/dir]
python src/data/process.py UNSW-NB15 -d data/raw/UNSW-NB15/ -o data/processed/
To reproduce the experiments of the paper, we will start by executing the hyper-optimization. This can be simply done by executing:
src/hyperopt/full_hyperopt.sh
The bash script will run three python files, which are hyper-optimizing each combination of model, encoder and augmentation on the data sets UNSW-NB15 and 5G-NIDD, where the models are BarlowTwins, BYOL, SimSiam, VICReg and W-MSE; the encoders are a CNN, a MLP and a FT-Transformer; and the augmentations are Gaussian Noise, Mixup, Random Shuffle, Subsets, SwapNoise (also called CutMix) and Zero Out Noise. In total 5 * 6 * 3 = 90 different experiments are tuned for the two datasets.
Step by step the script is optimizing:
- Model and augmentation parameters
- Learning rate
- Number of epochs to train the model
Similarly, the baselines Deep AutoEncoder and DeepSVDD are tuned with
src/hyperopt/full_hyperopt_baselines.sh
Note: This could take a significant amount of time and resources!
The hyper-parameters collected in the tuning step are stored in a YAML file. The final results are gathered by training the different models with the best hyper-parameters for 10 runs and averaging the performance metrics. This is done by executing
python src/utils/run_models.py
The result of each model together with the weights of the best performing run (depicted by the AUROC) is stored under run_results/
.
In a similar way, results of the optimized baselines are generated by executing
python src/utils/run_baselines.py
- Parameters like model, batch_size, n_epochs, device are self-explaining
- anomaly_label parameter indicates which label is the malicious label (commonly 1)
- ckpt_root - if given a checkpoint, will be saved every 5 epochs. Set to None, if you only want to save the model regularly
- safe_best_model - if given the trainer, will save the best model (lowest loss) as checkpoint.