Skip to content

Latest commit

 

History

History
193 lines (155 loc) · 4.44 KB

File metadata and controls

193 lines (155 loc) · 4.44 KB

regex_parser operator

The regex_parser operator parses the string-type field selected by parse_from with the given regular expression pattern.

Regex Syntax

This operator makes use of Go regular expression. When writing a regex, consider using a tool such as regex101.

Configuration Fields

Field Default Description
id regex_parser A unique identifier for the operator.
output Next in pipeline The connected operator(s) that will receive all outbound entries.
regex required A Go regular expression. The named capture groups will be extracted as fields in the parsed body.
parse_from body The field from which the value will be parsed.
parse_to attributes The field to which the value will be parsed.
on_error send The behavior of the operator if it encounters an error. See on_error.
if An expression that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers.
timestamp nil An optional timestamp block which will parse a timestamp field before passing the entry to the output operator.
severity nil An optional severity block which will parse a severity field before passing the entry to the output operator.
cache nil An optional cache block. See below for details.

Cache configuration

Regular expression matching results can be cached. This is useful when we need to parse a relatively small set of values repeatedly. Parsing file paths is a common use case.

The size of the cache is configurable, and it uses a FIFO replacement policy. It also includes athrottling mechanism, which will prevent more than 10% of the maximum size to be replaced within a 5 second interval.

Setting the size to 0 will disable the cache. This is the default.

Field Default Description
size 0 The maximum size of the cache.

Example Configurations

Parse the field message with a regular expression

Configuration:

- type: regex_parser
  parse_from: body.message
  regex: '^Host=(?P<host>[^,]+), Type=(?P<type>.*)$'
Input body Output body
{
  "timestamp": "",
  "body": {
    "message": "Host=127.0.0.1, Type=HTTP"
  }
}
{
  "timestamp": "",
  "body": {
    "message": "Host=127.0.0.1, Type=HTTP"
  },
  "attributes": {
    "host": "127.0.0.1",
    "type": "HTTP"
  }
}

Parse the body with a regular expression and also parse the timestamp

Configuration:

- type: regex_parser
  regex: '^Time=(?P<timestamp_field>\d{4}-\d{2}-\d{2}), Host=(?P<host>[^,]+), Type=(?P<type>.*)$'
  timestamp:
    parse_from: body.timestamp_field
    layout_type: strptime
    layout: '%Y-%m-%d'
Input body Output body
{
  "timestamp": "",
  "body": "Time=2020-01-31, Host=127.0.0.1, Type=HTTP"
}
{
  "timestamp": "2020-01-31T00:00:00-00:00",
  "body": "Time=2020-01-31, Host=127.0.0.1, Type=HTTP"
  "attributes": {
    "host": "127.0.0.1",
    "type": "HTTP"
  }
}

Parse the message field only if "type" is "hostname"

Configuration:

- type: regex_parser
  regex: '^Host=(?<host>)$'
  parse_from: body.message
  if: 'body.type == "hostname"'
Input body Output body
{
  "body": {
    "message": "Host=testhost",
    "type": "hostname"
  }
}
{
  "body": {
    "message": "Host=testhost",
    "type": "hostname"
  },
  "attributes": {
    "host": "testhost"
  }
}
{
  "body": {
    "message": "Key=value",
    "type": "keypair"
  }
}
{
  "body": {
    "message": "Key=value",
    "type": "keypair"
  }
}