Skip to content
This repository has been archived by the owner on Jun 5, 2023. It is now read-only.

Latest commit

 

History

History
1043 lines (977 loc) · 42.3 KB

CONFIG_PARAMETERS.md

File metadata and controls

1043 lines (977 loc) · 42.3 KB

All parameters in configurations

Table of contents

Environment variables

The environment variables are mainly used to store sensitive information like credentials or other TLS parameters. All these environment variables are optional.

Variable Values Default Notes
es_username String "" Username to connect to Elasticsearch
es_password String "" Password to connect to Elasticsearch
verify_certs Boolean True Whether the Elasticsearch certificate must be validated or not
ca_certs String None A path to a valid CA to validate the Elasticsearch server certificate

General configuration

ee-outliers makes use of a single configuration file containing all required parameters such as connectivity with your Elasticsearch cluster, logging, etc.

A default configuration file with all required configuration sections and parameters, along with an explanation, be found in defaults/outliers.conf.

General

General
Key parameters (*Mandatory) Values Notes
es_url* String URL to connect to Elasticsearch. It supports https schema for TLS
es_index_pattern* String The name of the Elasticsearch index. Can be a glob pattern such as my_indexes*.
es_scan_size* Int size of the batch used by Elasticsearch for each search request.
es_scroll_time* [Integer][letter] where letter represents a duration (Hours, Minutes, Seconds) Specify how long a consistent view of the index should be maintained for scrolled Elasticsearch search.
es_timeout* Int Explicit timeout in seconds for each Elasticsearch request.
timestamp_field String The field name representing the event timestamp in Elasticsearch. Default value: timestamp.
es_save_results* 0, 1 If set to 1, save outlier detection results to Elasticsearch. If set to 0, do nothing.
print_outliers_to_console 0, 1 If set to 1, print outlier matches to the console. If set to 0, do nothing. Default value: 0.
history_window_days* Int Specify how many days back in time to process events and search for outliers. This value is combine with history_window_hours.
history_window_hours* Int Specify how many hours back in time to process events and search for outliers. This value is combine with history_window_days.
es_wipe_all_existing_outliers* 0, 1 If set to 1, wipe all existing outliers that fall in the history window upon first run. If set to 0, do nothing.
es_wipe_all_whitelisted_outliers* 0, 1 If set to 1, existing outliers are checked and wiped if they match with the whitelisting. If set to 0, do nothing.
run_models* 0, 1 If set to 1, run all use cases with key parameter run_model set to 1. If set to 0, do nothing.
test_models* 0, 1 If set to 1, run all use cases with key parameter test_model set to 1. If set to 0, do nothing.
log_verbosity* 0-5+ 0 for no progress info, 1-4 for progressively more outputs, 5+ for all the log output.
log_level* CRITICAL, ERROR, WARNING, INFO, DEBUG Sets the threshold for the logger. Logging messages which are less severe than level will be ignored.
log_file* String File path where the log messages will be saved

Assets

It allows to extract additional information within the outliers and save them in the dictionary field outliers.assets.

General
Key parameters (*Mandatory) Values Notes
Any existing field name String Example: timestamp=time will extract the value inside the field timestamp and add it to in the dictionary field outliers.assets at the key time.

Notifier

To have more information about the notification system, visit the page Notifications.

Notifier
Key parameters (*Mandatory) Values Notes
email_notifier* 0, 1 If set to 1, enable the notification system and the other key parameters from the section [notifier], except max_cache_ignore, become mandatory. If set to 0, do nothing.
notification_email String Email where the information needs to be sent.
smtp_user String SMTP username.
smtp_pass String SMTP password
smtp_server String SMTP server address.
smtp_port int SMTP port.
max_cache_ignore int Number of element keep in memory to avoid twice alerts for same notification. Default value: 1000.

Daemon

Used when ee-outliers is running on Daemon mode. In daemon mode, ee-outliers will continuously run based on a cron schedule which is defined by the following schedule parameter.

General
Key parameters Values Notes
schedule Standard cron format Only used when running ee-outliers in daemon mode. Example: schedule=10 0 * * * will run ee-outliers at 00:10 each night.

Simple query

Global parameters for all use cases of type simplequery.

The only global parameter for simplequery use cases is highlight_match. If set to 1, ee-outliers will use the Elasticsearch highlight mechanism to find the fields and values that matched the search query. The matched fields and values are respectively added to new dictionary fields outliers.matched_fields and outliers.matched_values.

Example: If the search query is es_query_filter=CurrentDirectory : sysmon AND Image: System32 AND Image: cmd.exe and the log event contains the fields:

CurrentDirectory: C:\sysmon\
Image: C:\Windows\System32\cmd.exe

It will add the fields:

outliers.matched_fields: {"CurrentDirectory": ["C:\\<value>sysmon</value>\\"],
                         "Image": ["C:\\Windows\\<value>System32</value>\\<value>cmd.exe</value>"]}
outliers.matched_values: {'CurrentDirectory': ['sysmon'], 'Image': ['System32', 'cmd.exe']}

Note that in the field outliers.matched_fields, the values that match the search query has been tagged as follow: <value>MACHTED_VALUE</value>.

General
Key parameters Values Notes
highlight_match 0, 1 If set to 1, it will use the Elasticsearch highlight mechanism to find the fields and values that matched the search query. The matched fields and values are respectively added to new dictionary fields outliers.matched_fields and outliers.matched_values. If set to 0, do nothing. Default: 0.

Terms

Global parameters for all use cases of type terms.

General
Key parameters Values Notes
terms_batch_eval_size Int Define how many events should be processed at the same time, before looking for outliers. Bigger batch means better results, but increase the memory usage.

Metrics

Global parameters for all use cases of type metrics.

General
Key parameters Values Notes
metric_batch_eval_size Int Define how many events should be processed at the same time, before looking for outliers. Bigger batch means better results, but increase the memory usage.

Sudden Appearance

Global parameters for all use cases of type sudden_appearance.

General
Key parameters Values Notes
max_num_aggregators Int Maximum number of estimated aggregation. If the number of aggregation defined in aggregator is bigger than max_num_aggregators, the returned results will not be accurate. Default: 100000.
max_num_targets Int Maximum number of estimated targets. If the number of terms defined in target is bigger than max_num_targets, the returned results will not be accurate. Default: 100000.

Word2vec

Global parameters for all use cases of type word2vec.

General
Key parameters Values Notes
word2vec_batch_eval_size Int Define how many events should be processed at the same time, before looking for outliers. Bigger batch means better results, but increase the memory usage.
min_target_buckets Int Minimum number of events required within an aggregation before processing word2vec analyzer.
drop_duplicates 0, 1 If set to 1, drops duplicate target elements within each aggregation. If set to 0, do nothing. Set to 0 by default. Note that when activated, dorp_duplicates can increases the memory size. The reason is that it generally increase the size of the vocabulary and therefore the size of the word2vec model.
use_prob_model 0, 1 If set to 1, use a probabilistic model instead of word2vec. If set to 0, use word2vec model. Used mainly to evaluate the performance of word2vec. The probabilistic model will compute the true probability that a context word to appear given a certain center word. P(context_word|center_word) = (num. of time the pair context_word-center_word appears)/(num. of time center_word appears). Set to 0 by default.
output_prob 0, 1 If set to 1, the models output the probability that a context word appears, given a certain center word. If set to 0, and use_prob_model=0 it outputs the raw value of word2vec (layer before the softmax). If set to 0, and use_prob_model=1 it outputs the logarithmic of the probabilities. Set to 1 by default.
separators regex format between quotes Will split target elements by the occurrence of the regex pattern. Example: If separators="\.| " and target of one event is "Our website is nviso.eu" the output tokens will became ["Our", "website", "is", "nviso", "eu"].
size_window Int Size of the context window. Note that as you increase the size window, the number of center word - context word combination will increase. It will then result in a augmentation of memory size and computation time.
min_uniq_word_occurrence Int If a word appears less than min_uniq_word_occurrence times, it will be replaced by the 'UNKNOWN' word. Set to 1 by default. Note that as it reduces the vocabulary size of the model, it reduces the memory size.
num_epoch Int Number of times word2vec model trains on all events within one aggregation. Set to 1 by default.
learning_rate Float The learning rate of the word2vec model. Set to 0.001 by default.
embedding_size Int Embedding size of the word2vec model. Set to 40 by default.
seed Int The random seed to make word2vec deterministic. If set to 0 it make word2vec non deterministic. If deterministic, it will also read documents chronologically and therefore reduce Elasticsearch scanning performance. Set to 0 by default.
print_score_table 0, 1 Print all outlier scores on a table. Set to 0 by default.
print_confusion_matrix 0, 1 Print confusion matrix and precision, recall and F-Measure metrics. Work only if the field "label" (equal to 0 or 1) exist in Elasticsearch events. Set to 0 by default.
trigger_focus word, text If set to text, it triggers events based on global text score. If set to word, it triggers events based on word score. Set to word by default.
trigger_score center, context, total, mean Type of score the events are triggered on. Mean compatible only with trigger_focus=text

Derived fields

Some fields contains multiple information, like timestamp that can be split between sub fields year, month, etc..

It requires any existing field name (e.g. timestamp) as key parameter and using the GROK format as value to extract the sub information. The sub information will be extracted from all processed events, and added as new fields in case an outlier event is found. The format for the new field will be: outlier.derived_<field_name> (e.g. outliers.derived_timestamp_year).

Note that, these fields are extracted BEFORE the analysis happens and with their original field_name (e.g. timestamp_year), which means that these fields can also be used as for example with aggregators or targets in use cases.

General
Key parameters Values Notes
Any existing field name GROK format Example: timestamp=%{YEAR:timestamp_year}-%{MONTHNUM:timestamp_month}-%{MONTHDAY:timestamp_day}[T ]%{HOUR:timestamp_hour}:?%{MINUTE:timestamp_minute}(?::?%{SECOND:timestamp_second})?%{ISO8601_TIMEZONE:timestamp_timezone}? will creates from the field timestamp the fields derived_timestamp_year, derived_timestamp_month, etc..

Whitelist literals

By whitelisting an outlier, you prevent them from being tagged and stored in Elasticsearch. For events that have already been enriched and that match a whitelist later, the es_wipe_all_whitelisted_outliers flag can be used in order to remove them.

To have more information about literals whitelist, visit the page Whitelisting outliers.

General
Key parameters Values Notes
Any existing field name String This whitelist will only hit for outlier events that contain an exact whitelisted string as one of its event field values. The whitelist is checked against all the event fields, not only the outlier fields! Example: slack_connection=rare outbound connection: Slack.exe.

Whitelist regexps

By whitelisting an outlier, you prevent them from being tagged and stored in Elasticsearch. For events that have already been enriched and that match a whitelist later, the es_wipe_all_whitelisted_outliers flag can be used in order to remove them.

To have more information about literals whitelist, visit the page Whitelisting outliers.

General
Key parameters Values Notes
Any existing field name regex format This whitelist will hit for all outlier events that contain a regular expression match against one of its event field values. The whitelist is checked against all the event fields, not only the outlier fields. Example: autorun_user_specific=^.*rare autorun:.*-.*-.*-.*-.*$.

Analyzers parameters

To have more information about the configuration of one analyzer, visit the page Building detection use cases .

Common analyzers parameters

All analyzers
Key parameters (*Mandatory) Values Notes
es_query_filter* String Any valid Elasticsearch query.
es_dsl_filter String DSL filter on Elasticsearch query.
timestamp_field String Can be any document field. It will override the general settings timestamp_field.
history_window_days Int Override history_window_days parameter in general settings.
history_window_hours Int Override history_window_hours parameter in general settings.
should_notify 0, 1 If set to 1, notify the use case via the notifier if email_notifier is set to 1. If set to 0, do nothing.
use_derived_fields 0, 1 Enable or not the utilisation of derived fields.
es_index String Override the es_index_pattern parameter in general settings
outlier_type* String Freetext field which will be added to the outlier event as new field named outliers.outlier_type.
outlier_reason* String Freetext field which will be added to the outlier event as new field named outliers.reason.
outlier_summary* String Freetext field which will be added to the outlier event as new field named outliers.summary.
run_model* 0, 1 If set to 1, model run if run_models parameter in general settings is set to 1.
test_model* 0, 1 If set to 1, model run if test_models parameter in general settings is set to 1.

Usual model parameters

The following parameters could be used for analyzers terms, metrics and word2vec. More information available here.

Usual model parameters (Terms, Metrics)
Key parameters (*Mandatory) Values Notes
trigger_on* low, high If set to low, triggers events with model computed value lower than the decision boundary. If set to high, triggers events with model computed value higher than the detection boundary.
trigger_method* -percentile Percentile. trigger_sensitivity ranges from 0-100.
-pct_of_max_value Percentage of maximum value. trigger_sensitivity ranges from 0-100.
-pct_of_median_value Percentage of median value. trigger_sensitivity ranges from 0-100.
-pct_of_avg_value Percentage of average value. trigger_sensitivity ranges from 0-100.
-mad Median Average Deviation. trigger_sensitivity defines the total number of deviations and ranges from 0-Inf..
-madpos Same as mad but the trigger value will always be positive. In case mad is negative, it will result 0.
-stdev Standard Deviation. trigger_sensitivity defines the total number of deviations and ranges from 0-Inf..
-float Fixed value to trigger on. trigger_sensitivity defines the trigger value.
-coeff_of_variation Coefficient of variation. trigger_sensitivity defines the total number of coefficient of variation and ranges from 0-Inf..
trigger_sensitivity* 0-100, 0-Inf. Value of the sensitivity linked to the trigger_method
process_documents_chronologically 0, 1 If set to 1, process documents chronologically when analysing the model. Set by default to 0 as it has high impact on Elasticsearch scanning performance.
target* String Document field that will be used to do the computation (based on the trigger_method selected).
aggregator* Strings separated by a , One or multiple document fields that will be used to group documents.

Arbitrary parameters

Any other parameters that are not used by the model will be automatically copied to the outlier parameter. More information available here.

Simple query parameters

Simple query
Key parameters Values Notes
highlight_match 0, 1 Override highlight_match parameter in general simplequery settings.

Metrics parameters

Metrics
Key parameters (*Mandatory) Values Notes
metric* -numerical_value Use the numerical value of the target field as metric. Example: numerical_value("2") => 2.
-length Use the target field length as metric. Example: length("outliers") => 8.
-entropy Use the entropy of the field as metric. Example: entropy("houston") => 2.5216406363433186.
-hex_encoded_length Calculate total length of hexadecimal encoded substrings in the target and use this as metric.
base64_encoded_length Calculate total length of base64 encoded substrings in the target and use this as metric. Example: base64_encoded_length("houston we have a cHJvYmxlbQ==") => base64_decoded_string: problem, base64_encoded_length: 7.
-url_length Extract all URLs from the target value and use this as metric. Example: url_length("why don't we go http://www.dance.com") => extracted_urls: http://www.dance.com, extracted_urls_length: 20.
-relative_english_entropy Compute Kullback Leibler entropy.

Terms parameters

Terms
Key parameters (*Mandatory) Values Notes
target_count_method* within_aggregator, across_aggregators If set to across_aggregator the analysis will be performed across all values of the aggregator at the same time. If set to within_aggregator, will be performed for each value of the aggregator separately.
min_target_buckets Int Minimum number of events within an aggregation before processing terms analyzer. Only with the target_count_method set on within_aggregator.

Sudden Appearance parameters

Sudden Appearance
Key parameters (*Mandatory) Values Notes
target* String separated by , One or multiple document fields that will be analyzed for sudden appearance in group documents.
aggregator* String separated by , One or multiple document fields that will be used to group documents. Each document that contains the same combination of field values will be assembled in the same group.
history_window_days Int Override history_window_days parameter in general settings.
history_window_hours Int Override history_window_hours parameter in general settings.
sliding_window_size* DDD:HH:MM Size of the sliding window where DDD define the number of days, HH the number of hours and MM the number of minutes. Example: 20:13:20 will correspond to a sliding window of size 20 days, 13 hours and 20 minutes.
sliding_window_step_size* DDD:HH:MM Size of the sliding step where DDD define the number of days, HH the number of hours and MM the number of minutes. The sliding step represents the jump step in time, the sliding window will slide withing the global window. Example: 10:01:02 will correspond to a sliding step of size 10 days, 1 hours and 2 minutes.

Word2vec parameters

Word2vec
Key parameters Values Notes
word2vec_batch_eval_size Int Override word2vec_batch_eval_size parameter in word2vec general configuration.
min_target_buckets Int Override min_target_buckets parameter in word2vec general configuration.
drop_duplicates 0, 1 Override drop_duplicates parameter in word2vec general configuration.
use_prob_model 0, 1 Override use_prob_model parameter in word2vec general configuration.
output_prob 0, 1 Override output_prob parameter in word2vec general configuration.
separators regex format between quotes Override drop_duplicates parameter in word2vec general configuration.
size_window Int Override size_window parameter in word2vec general configuration.
min_uniq_word_occurrence Int Override min_uniq_word_occurrence parameter in word2vec general configuration.
num_epoch Int Override num_epoch parameter in word2vec general configuration.
learning_rate Float Override learning_rate parameter in word2vec general configuration.
embedding_size Int Override embedding_size parameter in word2vec general configuration.
seed Int Override seed parameter in word2vec general configuration.
print_score_table 0, 1 Override print_score_table parameter in word2vec general configuration.
print_confusion_matrix 0, 1 Override print_confusion_matrix parameter in word2vec general configuration.
trigger_focus word, text Override trigger_focus parameter in word2vec general configuration.
trigger_score center, context, total, mean Override trigger_score parameter in word2vec general configuration.