Skip to content

Latest commit

 

History

History
1165 lines (934 loc) · 33.1 KB

processors-using.asciidoc

File metadata and controls

1165 lines (934 loc) · 33.1 KB

Define processors

You can use processors to filter and enhance data before sending it to the configured output. To define a processor, you specify the processor name, an optional condition, and a set of parameters:

processors:
- <processor_name>:
    when:
      <condition>
    <parameters>

- <processor_name>:
    when:
      <condition>
    <parameters>

...

Where:

  • <processor_name> specifies a processor that performs some kind of action, such as selecting the fields that are exported or adding metadata to the event.

  • <condition> specifies an optional condition. If the condition is present, then the action is executed only if the condition is fulfilled. If no condition is passed, then the action is always executed.

  • <parameters> is the list of parameters to pass to the processor.

Where are processors valid?

Processors are valid:

  • At the top-level in the configuration. The processor is applied to all data collected by {beatname_uc}.

  • Under a specific {processor-scope}. The processor is applied to the data collected for that {processor-scope}.

Conditions

Each condition receives a field to compare. You can specify multiple fields under the same condition by using AND between the fields (for example, field1 AND field2).

For each field, you can specify a simple field name or a nested map, for example dns.question.name.

See [exported-fields] for a list of all the fields that are exported by {beatname_uc}.

The supported conditions are:

equals

With the equals condition, you can compare if a field has a certain value. The condition accepts only an integer or a string value.

For example, the following condition checks if the response code of the HTTP transaction is 200:

equals:
  http.response.code: 200
contains

The contains condition checks if a value is part of a field. The field can be a string or an array of strings. The condition accepts only a string value.

For example, the following condition checks if an error is part of the transaction status:

contains:
  status: "Specific error"
regexp

The regexp condition checks the field against a regular expression. The condition accepts only strings.

For example, the following condition checks if the process name starts with foo:

regexp:
  system.process.name: "foo.*"
range

The range condition checks if the field is in a certain range of values. The condition supports lt, lte, gt and gte. The condition accepts only integer or float values.

For example, the following condition checks for failed HTTP transactions by comparing the http.response.code field with 400.

range:
    http.response.code:
        gte: 400

This can also be written as:

range:
    http.response.code.gte: 400

The following condition checks if the CPU usage in percentage has a value between 0.5 and 0.8.

range:
    system.cpu.user.pct.gte: 0.5
    system.cpu.user.pct.lt: 0.8
has_fields

The has_fields condition checks if all the given fields exist in the event. The condition accepts a list of string values denoting the field names.

For example, the following condition checks if the http.response.code field is present in the event.

has_fields: ['http.response.code']
or

The or operator receives a list of conditions.

or:
  - <condition1>
  - <condition2>
  - <condition3>
  ...

For example, to configure the condition http.response.code = 304 OR http.response.code = 404:

or:
  - equals:
      http.response.code: 304
  - equals:
      http.response.code: 404
and

The and operator receives a list of conditions.

and:
  - <condition1>
  - <condition2>
  - <condition3>
  ...

For example, to configure the condition http.response.code = 200 AND status = OK:

and:
  - equals:
      http.response.code: 200
  - equals:
      status: OK

To configure a condition like <condition1> OR <condition2> AND <condition3>:

or:
 - <condition1>
 - and:
    - <condition2>
    - <condition3>
not

The not operator receives the condition to negate.

not:
  <condition>

For example, to configure the condition NOT status = OK:

not:
  equals:
    status: OK

Add cloud metadata

The add_cloud_metadata processor enriches each event with instance metadata from the machine’s hosting provider. At startup it will detect the hosting provider and cache the instance metadata.

The following cloud providers are supported:

  • Amazon Elastic Compute Cloud (EC2)

  • Digital Ocean

  • Google Compute Engine (GCE)

  • Tencent Cloud (QCloud)

  • Alibaba Cloud (ECS)

  • Azure Virtual Machine

  • Openstack Nova

The simple configuration below enables the processor.

processors:
- add_cloud_metadata: ~

The add_cloud_metadata processor has one optional configuration setting named timeout that specifies the maximum amount of time to wait for a successful response when detecting the hosting provider. The default timeout value is 3s.

If a timeout occurs then no instance metadata will be added to the events. This makes it possible to enable this processor for all your deployments (in the cloud or on-premise).

The metadata that is added to events varies by hosting provider. Below are examples for each of the supported providers.

EC2

{
  "meta": {
    "cloud": {
      "availability_zone": "us-east-1c",
      "instance_id": "i-4e123456",
      "machine_type": "t2.medium",
      "provider": "ec2",
      "region": "us-east-1"
    }
  }
}

Digital Ocean

{
  "meta": {
    "cloud": {
      "instance_id": "1234567",
      "provider": "digitalocean",
      "region": "nyc2"
    }
  }
}

GCE

{
  "meta": {
    "cloud": {
      "availability_zone": "projects/1234567890/zones/us-east1-b",
      "instance_id": "1234556778987654321",
      "machine_type": "projects/1234567890/machineTypes/f1-micro",
      "project_id": "my-dev",
      "provider": "gce"
    }
  }
}

Tencent Cloud

{
  "meta": {
    "cloud": {
      "availability_zone": "gz-azone2",
      "instance_id": "ins-qcloudv5",
      "provider": "qcloud",
      "region": "china-south-gz"
    }
  }
}

Alibaba Cloud

This metadata is only available when VPC is selected as the network type of the ECS instance.

{
  "meta": {
    "cloud": {
      "availability_zone": "cn-shenzhen",
      "instance_id": "i-wz9g2hqiikg0aliyun2b",
      "provider": "ecs",
      "region": "cn-shenzhen-a"
    }
  }
}

Azure Virtual Machine

{
  "meta": {
    "cloud": {
      "provider": "az",
      "instance_id": "04ab04c3-63de-4709-a9f9-9ab8c0411d5e",
      "instance_name": "test-az-vm",
      "machine_type": "Standard_D3_v2",
      "region": "eastus2"
    }
  }
}

Openstack Nova

{
  "meta": {
    "cloud": {
      "provider": "openstack",
      "instance_name": "test-998d932195.mycloud.tld",
      "availability_zone": "xxxx-az-c",
      "instance_id": "i-00011a84",
      "machine_type": "m2.large"
    }
  }
}

Add the local time zone

The add_locale processor enriches each event with the machine’s time zone offset from UTC or with the name of the time zone. It supports one configuration option named format that controls whether an offset or time zone abbreviation is added to the event. The default format is offset. The processor adds the a beat.timezone value to each event.

The configuration below enables the processor with the default settings.

processors:
- add_locale: ~

This configuration enables the processor and configures it to add the time zone abbreviation to events.

processors:
- add_locale:
    format: abbreviation
Note
Please note that add_locale differentiates between daylight savings time (DST) and regular time. For example CEST indicates DST and and CET is regular time.

Decode JSON fields

The decode_json_fields processor decodes fields containing JSON strings and replaces the strings with valid JSON objects.

processors:
 - decode_json_fields:
     fields: ["field1", "field2", ...]
     process_array: false
     max_depth: 1
     target: ""
     overwrite_keys: false

The decode_json_fields processor has the following configuration settings:

fields

The fields containing JSON strings to decode.

process_array

(Optional) A boolean that specifies whether to process arrays. The default is false.

max_depth

(Optional) The maximum parsing depth. The default is 1.

target

(Optional) The field under which the decoded JSON will be written. By default the decoded JSON object replaces the string field from which it was read. To merge the decoded JSON fields into the root of the event, specify target with an empty string (target: ""). Note that the null value (target:) is treated as if the field was not set at all.

overwrite_keys

(Optional) A boolean that specifies whether keys that already exist in the event are overwritten by keys from the decoded JSON object. The default value is false.

Drop events

The drop_event processor drops the entire event if the associated condition is fulfilled. The condition is mandatory, because without one, all the events are dropped.

processors:
 - drop_event:
     when:
        condition

See Conditions for a list of supported conditions.

Drop fields from events

The drop_fields processor specifies which fields to drop if a certain condition is fulfilled. The condition is optional. If it’s missing, the specified fields are always dropped. The @timestamp and type fields cannot be dropped, even if they show up in the drop_fields list.

processors:
 - drop_fields:
     when:
        condition
     fields: ["field1", "field2", ...]

See Conditions for a list of supported conditions.

Note
If you define an empty list of fields under drop_fields, then no fields are dropped.

Keep fields from events

The include_fields processor specifies which fields to export if a certain condition is fulfilled. The condition is optional. If it’s missing, the specified fields are always exported. The @timestamp and type fields are always exported, even if they are not defined in the include_fields list.

processors:
 - include_fields:
     when:
        condition
     fields: ["field1", "field2", ...]

See Conditions for a list of supported conditions.

You can specify multiple include_fields processors under the processors section.

Note
If you define an empty list of fields under include_fields, then only the required fields, @timestamp and type, are exported.

Rename fields from events

The rename processor specifies a list of fields to rename. Under the fields key each entry contains a from: old-key and a to: new-key pair. from is the origin and to the target name of the field.

Renaming fields can be useful in cases where field names cause conflicts. For example if an event has two fields, c and c.b, that are both assigned scalar values (e.g. {"c": 1, "c.b": 2}) this will result in an Elasticsearch error at ingest time. This is because the value of a cannot simultaneously be a scalar and an object. To prevent this rename_fields can be used to rename c to c.value.

Rename fields cannot be used to overwrite fields. To overwrite fields either first rename the target field or use the drop_fields processor to drop the field and then rename the field.

processors:
- rename:
    fields:
     - from: "a.g"
       to: "e.d"
    ignore_missing: false
    fail_on_error: true

The rename processor has the following configuration settings:

ignore_missing

(Optional) If set to true, no error is logged in case a key which should be renamed is missing. Default is false.

fail_on_error

(Optional) If set to true, in case of an error the renaming of fields is stopped and the original event is returned. If set to false, renaming continues also if an error happened during renaming. Default is true.

See Conditions for a list of supported conditions.

You can specify multiple ignore_missing processors under the processors section.

Add Kubernetes metadata

The add_kubernetes_metadata processor annotates each event with relevant metadata based on which Kubernetes pod the event originated from. Each event is annotated with:

  • Pod Name

  • Pod UID

  • Namespace

  • Labels

The add_kubernetes_metadata processor has two basic building blocks which are:

  • Indexers

  • Matchers

Indexers take in a pod’s metadata and builds indices based on the pod metadata. For example, the ip_port indexer can take a Kubernetes pod and index the pod metadata based on all pod_ip:container_port combinations.

Matchers are used to construct lookup keys for querying indices. For example, when the fields matcher takes ["metricset.host"] as a lookup field, it would construct a lookup key with the value of the field metricset.host.

Each Beat can define its own default indexers and matchers which are enabled by default. For example, FileBeat enables the container indexer, which indexes pod metadata based on all container IDs, and a logs_path matcher, which takes the source field, extracts the container ID, and uses it to retrieve metadata.

The configuration below enables the processor when {beatname_lc} is run as a pod in Kubernetes.

processors:
- add_kubernetes_metadata:
    in_cluster: true

The configuration below enables the processor on a Beat running as a process on the Kubernetes node.

processors:
- add_kubernetes_metadata:
    in_cluster: false
    host: <hostname>
    kube_config: ${HOME}/.kube/config

The configuration below has the default indexers and matchers disabled and enables ones that the user is interested in.

processors:
- add_kubernetes_metadata:
    in_cluster: false
    host: <hostname>
    kube_config: ~/.kube/config
    default_indexers.enabled: false
    default_matchers.enabled: false
    indexers:
      - ip_port:
    matchers:
      - fields:
          lookup_fields: ["metricset.host"]

The add_kubernetes_metadata processor has the following configuration settings:

in_cluster

(Optional) Use in cluster settings for Kubernetes client, true by default.

host

(Optional) Identify the node where {beatname_lc} is running in case it cannot be accurately detected, as when running {beatname_lc} in host network mode.

kube_config

(Optional) Use given config file as configuration for Kubernetes client.

default_indexers.enabled

(Optional) Enable/Disable default pod indexers, in case you want to specify your own.

default_matchers.enabled

(Optional) Enable/Disable default pod matchers, in case you want to specify your own.

Add Docker metadata

The add_docker_metadata processor annotates each event with relevant metadata from Docker containers:

  • Container ID

  • Name

  • Image

  • Labels

Note

When running {beatname_uc} in a container, you need to provide access to Docker’s unix socket in order for the add_docker_metadata processor to work. You can do this by mounting the socket inside the container. For example:

docker run -v /var/run/docker.sock:/var/run/docker.sock …​

To avoid privilege issues, you may also need to add --user=root to the docker run flags. Because the user must be part of the docker group in order to access /var/run/docker.sock, root access is required if {beatname_uc} is running as non-root inside the container.

processors:
- add_docker_metadata:
    host: "unix:///var/run/docker.sock"
    #match_fields: ["system.process.cgroup.id"]
    #match_pids: ["process.pid", "process.ppid"]
    #match_source: true
    #match_source_index: 4
    #match_short_id: true
    #cleanup_timeout: 60
    #labels.dedot: false
    # To connect to Docker over TLS you must specify a client and CA certificate.
    #ssl:
    #  certificate_authority: "/etc/pki/root/ca.pem"
    #  certificate:           "/etc/pki/client/cert.pem"
    #  key:                   "/etc/pki/client/cert.key"

It has the following settings:

host

(Optional) Docker socket (UNIX or TCP socket). It uses unix:///var/run/docker.sock by default.

ssl

(Optional) SSL configuration to use when connecting to the Docker socket.

match_fields

(Optional) A list of fields to match a container ID, at least one of them should hold a container ID to get the event enriched.

match_pids

(Optional) A list of fields that contain process IDs. If the process is running in Docker then the event will be enriched. The default value is ["process.pid", "process.ppid"].

match_source

(Optional) Match container ID from a log path present in the source field. Enabled by default.

match_short_id

(Optional) Match container short ID from a log path present in the source field. Disabled by default. This allows to match directories names that have the first 12 characters of the container ID. For example, /var/log/containers/b7e3460e2b21/*.log.

match_source_index

(Optional) Index in the source path split by / to look for container ID. It defaults to 4 to match /var/lib/docker/containers/<container_id>/*.log

cleanup_timeout

(Optional) Time of inactivity to consider we can clean and forget metadata for a container, 60s by default.

labels.dedot

(Optional) Default to be false. If set to true, replace dots in labels with _.

Add Host metadata

beta[]

processors:
- add_host_metadata:
    netinfo.enabled: false
    cache.ttl: 5m
    geo:
      name: nyc-dc1-rack1
      location: 40.7128, -74.0060
      continent_name: North America
      country_iso_code: US
      region_name: New York
      region_iso_code: NY
      city_name: New York

It has the following settings:

netinfo.enabled

(Optional) Default false. Include IP addresses and MAC addresses as fields host.ip and host.mac

cache.ttl

(Optional) The processor uses an internal cache for the host metadata. This sets the cache expiration time. The default is 5m, negative values disable caching altogether.

geo.name

User definable token to be used for identifying a discrete location. Frequently a datacenter, rack, or similar.

geo.location

Longitude and latitude in comma separated format.

geo.continent_name

Name of the continent.

geo.country_name

Name of the country.

geo.region_name

Name of the region.

geo.city_name

Name of the city.

geo.country_iso_code

ISO country code.

geo.region_iso_code

ISO region code.

The add_host_metadata processor annotates each event with relevant metadata from the host machine. The fields added to the event are looking as following:

{
   "host":{
      "architecture":"x86_64",
      "name":"example-host",
      "id":"",
      "os":{
         "family":"darwin",
         "build":"16G1212",
         "platform":"darwin",
         "version":"10.12.6",
         "kernel":"16.7.0",
         "name":"Mac OS X"
      },
      "ip": ["192.168.0.1", "10.0.0.1"],
      "mac": ["00:25:96:12:34:56", "72:00:06:ff:79:f1"],
      "geo": {
          "continent_name": "North America",
          "country_iso_code": "US",
          "region_name": "New York",
          "region_iso_code": "NY",
          "city_name": "New York",
          "name": "nyc-dc1-rack1",
          "location": "40.7128, -74.0060"
        }
   }
}

Dissect strings

The dissect processor tokenizes incoming strings using defined patterns.

processors:
- dissect:
    tokenizer: "%{key1} %{key2}"
    field: "message"
    target_prefix: "dissect"

The dissect processor has the following configuration settings:

field

(Optional) The event field to tokenize. Default is message.

target_prefix

(Optional) The name of the field where the values will be extracted. When an empty string is defined, the processor will create the keys at the root of the event. Default is dissect. When the target key already exists in the event, the processor won’t replace it and log an error; you need to either drop or rename the key before using dissect.

For tokenization to be successful, all keys must be found and extracted, if one of them cannot be found an error will be logged and no modification is done on the original event.

Note
A key can contain any characters except reserved suffix or prefix modifiers: /,&, + and ?.

See Conditions for a list of supported conditions.

DNS Reverse Lookup

The DNS processor performs reverse DNS lookups of IP addresses. It caches the responses that it receives in accordance to the time-to-live (TTL) value contained in the response. It also caches failures that occur during lookups. Each instance of this processor maintains its own independent cache.

The processor uses its own DNS resolver to send requests to nameservers and does not use the operating system’s resolver. It does not read any values contained in /etc/hosts.

This processor can significantly slow down your pipeline’s throughput if you have a high latency network or slow upstream nameserver. The cache will help with performance, but if the addresses being resolved have a high cardinality then the cache benefits will be diminished due to the high miss ratio.

By way of example, if each DNS lookup takes 2 milliseconds, the maximum throughput you can achieve is 500 events per second (1000 milliseconds / 2 milliseconds). If you have a high cache hit ratio then your throughput can be higher.

This is a minimal configuration example that resolves the IP addresses contained in two fields.

processors:
- dns:
    type: reverse
    fields:
      source.ip: source.hostname
      destination.ip: destination.hostname

Next is a configuration example showing all options.

processors:
- dns:
    type: reverse
    action: append
    fields:
      server.ip: server.hostname
      client.ip: client.hostname
    success_cache:
      capacity.initial: 1000
      capacity.max: 10000
    failure_cache:
      capacity.initial: 1000
      capacity.max: 10000
      ttl: 1m
    nameservers: ['192.0.2.1', '203.0.113.1']
    timeout: 500ms
    tag_on_failure: [_dns_reverse_lookup_failed]

The dns processor has the following configuration settings:

type

The type of DNS lookup to perform. The only supported type is reverse which queries for a PTR record.

action

This defines the behavior of the processor when the target field already exists in the event. The options are append (default) and replace.

fields

This is a mapping of source field names to target field names. The value of the source field will be used in the DNS query and result will be written to the target field.

success_cache.capacity.initial

The initial number of items that the success cache will be allocated to hold. When initialized the processor will allocate the memory for this number of items. Default value is 1000.

success_cache.capacity.max

The maximum number of items that the success cache can hold. When the maximum capacity is reached a random item is evicted. Default value is 10000.

failure_cache.capacity.initial

The initial number of items that the failure cache will be allocated to hold. When initialized the processor will allocate the memory for this number of items. Default value is 1000.

failure_cache.capacity.max

The maximum number of items that the failure cache can hold. When the maximum capacity is reached a random item is evicted. Default value is 10000.

failure_cache.ttl

The duration for which failures are cached. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". Default value is 1m.

nameservers

A list of nameservers to query. If there are multiple servers, the resolver queries them in the order listed. If none are specified then it will read the nameservers listed in /etc/resolv.conf once at initialization. On Windows you must always supply at least one nameserver.

timeout

The duration after which a DNS query will timeout. This is timeout for each DNS request so if you have 2 nameservers then the total timeout will be 2 times this value. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". Default value is 500ms.

tag_on_failure

A list of tags to add to the event when any lookup fails. The tags are only added once even if multiple lookups fail. By default no tags are added upon failure.

Add process metadata

The Add process metadata processor enriches events with information from running processes, identified by their process ID (PID).

processors:
- add_process_metadata:
    match_pids: [system.process.ppid]
    target: system.process.parent

The fields added to the event look as follows:

"process": {
  "name":  "systemd",
  "title": "/usr/lib/systemd/systemd --switched-root --system --deserialize 22",
  "exe":   "/usr/lib/systemd/systemd",
  "args":  ["/usr/lib/systemd/systemd", "--switched-root", "--system", "--deserialize", "22"],
  "pid":   1,
  "ppid":  0,
  "start_time": "2018-08-22T08:44:50.684Z",
}

Optionally, the process environment can be included, too:

  ...
  "env": {
    "HOME":       "/",
    "TERM":       "linux",
    "BOOT_IMAGE": "/boot/vmlinuz-4.11.8-300.fc26.x86_64",
    "LANG":       "en_US.UTF-8",
  }
  ...

It has the following settings:

match_pids

List of fields to lookup for a PID. The processor will search the list sequentially until the field is found in the current event, and the PID lookup will be applied to the value of this field.

target

(Optional) Destination prefix where the process object will be created. The default is the event’s root.

include_fields

(Optional) List of fields to add. By default, the processor will add all the available fields except process.env.

ignore_missing

(Optional) When set to false, events that don’t contain any of the fields in match_pids will be discarded and an error will be generated. By default, this condition is ignored.

overwrite_keys

(Optional) By default, if a target field already exists, it will not be overwritten and an error will be logged. If overwrite_keys is set to true, this condition will be ignored.

restricted_fields

(Optional) By default, the process.env field is not output, to avoid leaking sensitive data. If restricted_fields is true, the field will be present in the output.