The regex_parser
operator parses the string-type field selected by parse_from
with the given regular expression pattern.
This operator makes use of Go regular expression. When writing a regex, consider using a tool such as regex101.
Field | Default | Description |
---|---|---|
id |
regex_parser |
A unique identifier for the operator. |
output |
Next in pipeline | The connected operator(s) that will receive all outbound entries. |
regex |
required | A Go regular expression. The named capture groups will be extracted as fields in the parsed body. |
parse_from |
body |
The field from which the value will be parsed. |
parse_to |
attributes |
The field to which the value will be parsed. |
on_error |
send |
The behavior of the operator if it encounters an error. See on_error. |
if |
An expression that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. | |
timestamp |
nil |
An optional timestamp block which will parse a timestamp field before passing the entry to the output operator. |
severity |
nil |
An optional severity block which will parse a severity field before passing the entry to the output operator. |
cache |
nil |
An optional cache block. See below for details. |
Regular expression matching results can be cached. This is useful when we need to parse a relatively small set of values repeatedly. Parsing file paths is a common use case.
The size of the cache is configurable, and it uses a FIFO replacement policy. It also includes athrottling mechanism, which will prevent more than 10% of the maximum size to be replaced within a 5 second interval.
Setting the size to 0 will disable the cache. This is the default.
Field | Default | Description |
---|---|---|
size |
0 | The maximum size of the cache. |
Configuration:
- type: regex_parser
parse_from: body.message
regex: '^Host=(?P<host>[^,]+), Type=(?P<type>.*)$'
Input body | Output body |
{
"timestamp": "",
"body": {
"message": "Host=127.0.0.1, Type=HTTP"
}
} |
{
"timestamp": "",
"body": {
"message": "Host=127.0.0.1, Type=HTTP"
},
"attributes": {
"host": "127.0.0.1",
"type": "HTTP"
}
} |
Configuration:
- type: regex_parser
regex: '^Time=(?P<timestamp_field>\d{4}-\d{2}-\d{2}), Host=(?P<host>[^,]+), Type=(?P<type>.*)$'
timestamp:
parse_from: body.timestamp_field
layout_type: strptime
layout: '%Y-%m-%d'
Input body | Output body |
{
"timestamp": "",
"body": "Time=2020-01-31, Host=127.0.0.1, Type=HTTP"
} |
{
"timestamp": "2020-01-31T00:00:00-00:00",
"body": "Time=2020-01-31, Host=127.0.0.1, Type=HTTP"
"attributes": {
"host": "127.0.0.1",
"type": "HTTP"
}
} |
Configuration:
- type: regex_parser
regex: '^Host=(?<host>)$'
parse_from: body.message
if: 'body.type == "hostname"'
Input body | Output body |
{
"body": {
"message": "Host=testhost",
"type": "hostname"
}
} |
{
"body": {
"message": "Host=testhost",
"type": "hostname"
},
"attributes": {
"host": "testhost"
}
} |
{
"body": {
"message": "Key=value",
"type": "keypair"
}
} |
{
"body": {
"message": "Key=value",
"type": "keypair"
}
} |