Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDS-LODA clx migration #38

Merged
merged 12 commits into from
May 9, 2023
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ This model is a sequence binary classifier trained with vector representation of
## [Industrial Control System (ICS) Cyber Attack Detection](/operational-technology)
This model is an XGBoost classifier that predicts each event on a power system based on dataset features.

## [Intrusion Detection System using LODA algorithm](/ids-detection)
The model is a Loda anomaly detector for detecting an intrusion attack in the form of bots in a network using a netflow dataset.
tzemicheal marked this conversation as resolved.
Show resolved Hide resolved


# Repo Structure
Each prototype has its own directory that contains everything belonging to the specific prototype. Directories can include the following subfolders and documentation:

Expand Down
74 changes: 74 additions & 0 deletions ids-detection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
## Log Sequence Anomaly Detection

## Use Case
Intrusion detection using Lightweight Online Detector of Anomalies (LODA)

### Version
1.0

### Model Overview
The model is a Loda anomaly detector for intrusion detection usecase. Loda is trained to identify attacks in the form of bots from Netflow data. We use `cic_ids2017` benchmark dataset for the testing the performance of the model.

### Model Architecture

Loda (light weight online detector of anomalies), an ensemble of 1-D fixed histograms, where each histograms are built using random projection of features. The model is unsupervised anomaly detector where detection is done using negative log likelihood score.

### Requirements

Requirements can be installed with
```
pip install -r requirements.txt
```

### Training

#### Training data
The dataset for the example used are from Canadian Institute for Cybersecurity (CIC). The CICIDS2017 (https://www.unb.ca/cic/datasets/ids-2017.html) dataset contains benign and the most up-to-date common attacks, which resembles the true real-world data (PCAPs). It also includes the results of the network traffic analysis using CICFlowMeter with labeled flows based on the time stamp, source, and destination IPs, source and destination ports, protocols and attack (CSV files). Also available is the extracted features definition.


#### Training parameters

There are two main parameters used: number of random cuts for Loda and variance of the PCA transformation.
```
number_random_cuts = 1000
variance = 0.99
```
#### GPU Model
Tesla V100-SXM2

#### Model accuracy
The label distribution in the dataset is imbalanced, Average precision of 1.0 and Area under ROC curve of 0.74 is produced using test activity data.


#### Training script
To train the model, you can run the code in the notebook or alternatively, run the script under the `training-tunining-inference` directory using
`$DATASET` path to extracted CIC dataset.
```bash
python training.py --input-name $DATASET/Monday-WorkingHours.pcap_ISCX.csv --model-name ../model/loda_ids
```

This will save trained model and config file under `model` directory.

### Inference
To run inference from trained model, load the trained Loda model and config parameters as follows:
```bash
python inference.py --input-name $DATASET/Friday-WorkingHours-Morning.pcap_ISCX.csv --config-path ../model/config.json --model-name ../model/loda_ids.npz
```
### How To Use This Model
This model is an example of intrusion detection model using unsupervised anomaly detector. This model requires an aggregated netflow activity in the form of `cic_ids2017` format. Subset of the features used for training are described under `model/config.json`

### Input
The input is a netflow activity data collected in the form of tabular format.

### Output
The Unsupervised anomaly detector produce negative log likelihood as anomaly score of each data points. Large score indicates the more anomaly of the data point.

#### Out-of-scope use cases
N/A

### Ethical considerations
N/A

### Reference
1. Sharafaldin, I.,Lashkari, A. H., & Ghorbani, A. A. (2018, January). Toward generating a new intrusion detection dataset and intrusion traffic characterization
2. Pevny,T. (2016). Loda: Lightweight on-line detector of anomalies. Machine Learning
1 change: 1 addition & 0 deletions ids-detection/model/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"apply_pca": true, "training_columns": ["Flow_Duration", "Total_Fwd_Packets", "Total_Length_of_Fwd_Packets", "Fwd_Packet_Length_Max", "Fwd_Packet_Length_Min", "Fwd_Packet_Length_Mean", "Fwd_Packet_Length_Std", "Bwd_Packet_Length_Max", "Bwd_Packet_Length_Min", "Bwd_Packet_Length_Mean", "Bwd_Packet_Length_Std", "Flow_Bytes/s", "Flow_Packets/s", "Flow_IAT_Mean", "Flow_IAT_Std", "Flow_IAT_Max", "Flow_IAT_Min", "Fwd_IAT_Mean", "Fwd_IAT_Std", "Bwd_IAT_Mean", "Bwd_IAT_Std", "Bwd_IAT_Max", "Fwd_PSH_Flags", "Fwd_Header_Length", "Bwd_Header_Length", "Bwd_Packets/s", "Min_Packet_Length", "Max_Packet_Length", "Packet_Length_Mean", "Packet_Length_Variance", "FIN_Flag_Count", "RST_Flag_Count", "PSH_Flag_Count", "ACK_Flag_Count", "URG_Flag_Count", "Down/Up_Ratio", "Init_Win_bytes_forward", "Init_Win_bytes_backward", "min_seg_size_forward", "Active_Mean", "Active_Std", "Active_Max", "Active_Min", "Idle_Mean", "Idle_Std", "Destination_IP_Source_IP", "Source_Port_Source_IP", "Destination_Port_Source_IP", "Source_IP_Destination_IP", "Source_Port_Destination_IP", "Destination_Port_Destination_IP", "Source_IP_Source_Port", "Destination_IP_Source_Port", "Destination_Port_Source_Port", "Source_IP_Destination_Port", "Destination_IP_Destination_Port", "Source_Port_Destination_Port"], "n_pca_components": 6, "pca_variance": 0.99}
6 changes: 6 additions & 0 deletions ids-detection/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
cuml==0.6.1.post1
cupy==11.6.0
cupy_cuda11x==11.5.0
matplotlib==3.6.2
numpy==1.19.2
scikit_learn==1.2.2
Loading