AlertBERT

This is the code repository for the paper 'AlertBERT: A noise-robust alert grouping framework for simultaneous attacks' (publication pending).

Description

AlertBERT is a state-of-the-art self-supervised alert grouping method, that is based on masked-language-models to obtain alert embeddings and agglomerative clustering to work under high levels of noise and simultaneous cyber attacks. For further details on the AlertBERT framework, we refer to our paper (publication pending).

To recreate the experiments reported in the paper, first, run the module alertbert.train_mlm to train a masked-language-model. There the only parameters to adjust in MaskedLangModelParams passed to the main function are the number of attention heads and the used configuration of AIT-ADS-A. After completing the training of the masked-language-model, run the module alertbert.eval_groupingto compute the corresponding ROC-curves by providing the model_id to the model_config_generator function. To view the results, please refer to the plots of ROC-curves provided in the notebooks in the roc_results directory.

These results should then look something like the following ROC-plot obtained on the simul-attacks configuration of AIT-ADS-A.

The individual modules in alertbert have the following purposes:

alertbert.aitads provides the dataset,
alertbert.eval_grouping implements the evaluation of alert groupings,
alertbert.eval_mlm implements the evaluation of the training of masked-language-models,
alertbert.model_eval_utils provides evaluation utilities,
alertbert.models implements the alert grouping models,
alertbert.preprocessing implements vocabularies and tokenisation of the data,
alertbert.train_mlm performs the training of masked-language-models, and
alertbert.utils provides general utilities.

Installation

To run the experiments please follow these steps:

Python Environment

To set up the python environment, run pip install -r requirements.txt. Furthermore, it is necessary to install the graph-tools library, which is not available through pip.

Download and Prepare Data

For our experiments we use the AIT-ADS-A dataset, which is an augmented version of the AIT Alert Dataset. The files to construct this dataset are part of this repository and should be present in the aitads_augmented/data directory.

If this is the case, the setup is complete and the remaining steps are not necessary!

To build the dataset from source, follow these steps:

Download and unzip the three datasets into their respective directories listed below.
After this step the alerts_json directory should contain the files scenario_aminer.json and scenario_wazuh.json for each of the eight scenarios of AIT-ADS.
- AIT Alert Dataset ➡️ alerts_json
- AIT Log Dataset V2.0 ➡️ aitldsv2
- AIT Netflow Dataset ➡️ aitnds
Run preprocess.py.
This script will read the information of the three datasets, use it to assign the labels to the alerts in AIT-ADS, and save the labels to the files alerts_csv/scenario_alerts.csv for each scenario.

At this point, for each scenario we have the following situation:
The files alerts_json/scenario_wazuh.json and alerts_json/scenario_aminer.json contain the alert data sorted by timestamp, but separately for the two IDSs.
And the files alerts_csv/scenario_alerts.csv contain the labels for the alerts, but there the alerts are ordered so that they correspond to a concatenation of alerts_json/scenario_wazuh.json and alerts_json/scenario_aminer.json.
Thus, the next step:

Run unite_alerts_labels.py.
This script will simplify the situation described above by combining all the alerts and their labels, sorted by timestamps, into the files alerts_json/scenario.json for each scenario.
Additionally, the script will create the files alerts_json/scenario_light.json, which have the same contents except for the raw alert data, and can be used to speed up loading the data if the raw alerts are not required.
Finally, to create the files necessary for building AIT-ADS-A run build_augment_files.py. For further information regarding the setup of AIT-ADS-A please refer to the README file in the aitads_augmented directory.

Usage

Modules alertbert/module.py have to be run via python -m alertbert.module.
The documentation of the code can be found in the respective docstrings of functions, classes, and modules.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlertBERT

Description

Installation

Python Environment

Download and Prepare Data

Usage

About

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
aitads_augmented		aitads_augmented
aitldsv2		aitldsv2
aitnds		aitnds
alertbert		alertbert
alerts_csv		alerts_csv
alerts_json		alerts_json
roc_results		roc_results
saved_models		saved_models
server_configs		server_configs
LICENSE		LICENSE
README.md		README.md
abbrvs.py		abbrvs.py
attacktimes.py		attacktimes.py
build_augment_files.py		build_augment_files.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
timestampExtractor.py		timestampExtractor.py
unite_alerts_labels.py		unite_alerts_labels.py

License

ait-aecid/AlertBERT

Folders and files

Latest commit

History

Repository files navigation

AlertBERT

Description

Installation

Python Environment

Download and Prepare Data

Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages