The model is a sequence binary classifier trained with a vector representation of the log sequence of the BGL dataset. The task is to identify abnormal log sequences of alerts from a sequence of normally generated logs. This work is based on the model developed in the works of [2,3], for further detail refer the paper and associated code at the reference link.
- https://arxiv.org/pdf/2202.04301.pdf
- https://ieeexplore.ieee.org/document/9671642
- https://github.com/hanxiao0607/InterpretableSAD
Architecture Type:
- LSTM binary classifier with word2vector embedding.
Network Architecture:
- LSTM and Word2Vec
The input is an output of parsed system log messages represented as CSV file
Input Parameters:
output_dim = 2
emb_dim = 8
hidden_dim = 128
n_layers = 1
dropout = 0.0
batch_size = 32
n_epoch = 10
Input Format:
- CSV
Other Properties Related to Output:
- None
Binary classifier output is assigned to each sequence log message in the input file. The predicted output is appended to the last column of the input sequence.
Output Parameters:
- None
Output Format:
- CSV (log rows)
Other Properties Related to Output:
- None
Runtime(s):
- Pytorch
Supported Hardware Platform(s):
- Ampere/Turing
Supported Operating System(s):
- Linux
1.0
Link:
Properties (Quantity, Dataset Descriptions, Sensor(s)):
- The dataset for the example used is from BlueGene/L Supercomputer System (BGL). BGL dataset contains 4,747,963 log messages from supercomputer system at Lawrence Livermore National Labs. The model is trained and evaluated using 1 million rows of preprocessed logs using Drain parser
Dataset License:
Link:
Properties (Quantity, Dataset Descriptions, Sensor(s)):
- Processed 39K BGL log dataset.
Dataset License:
Engine:
- Pytorch
Test Hardware:
- Other (Not Listed)
- Not Applicable
- Not Applicable
- Not Applicable
- English: 100%
- Not Applicable
- Not Applicable
- Not Applicable
- Not Applicable
- Not Applicable
- Not Applicable
Individuals from the following adversely impacted (protected classes) groups participate in model design and testing.
- Not Applicable
- Not Applicable
- The model is primarily designed for testing purposes and serves as a small pretrained model specifically used to evaluate and validate the log sequence anomaly detection usecase.
- This model is intended for developers that want to build log-sequence based anomaly detection.
- The intended beneficiaries of this model are developers who aim to test the performance and functionality of the log sequence detector using public log datasets. It may not be suitable or provide significant value for real-world logs analysis.
- This model outputs binary prediction of being anomaly or not.
- This model is an example of a sequence binary classifier. This model requires parsed log messages as input for training and inference. The model and Word2Vector embedding is trained as follows in the training notebook. During inference, the trained model is loaded from
model
directory, and input files in the form of parsed logs are expected to output prediction for sequences of log messages.
Name the adversely impacted groups (protected classes) this has been tested to deliver comparable outcomes regardless of:
- Not Applicable
- The model expects system logs with specific features that match the training dataset. This model requires parsed log messages as input for training and inference.
- The model is evaluated using F1 score, accuracy for the ability to identify abnormal log sequence from set of sequence logs.
- None
- None
- No
- Not Applicable
- No
- Typically used for testing to identify abnormality out of sequence of logs
- Only tested for log sequence using the described parsed logs, it may not be suitable for other applications.
- No
- None
- No
- No
- No
- No
- No
- Neither
- The data used in this model is obtained from public shared data BlueGeme/L. There are no privacy concerns or PII involved in this data.
Protected classes used to create this model? (The following were used in model the model's training:)
- Not applicable
- Not applicable. The dataset is fully hosted and maintained by external source of Zenodo. Users can refer the main site dataset.
- Yes
- Not applicable
- No
- No
- Yes
- Not applicable
Is data compliant with data subject requests for data correction or removal, if such a request was made?
- Not applicable