|  | 
|  | 1 | +--- | 
|  | 2 | +layout: default | 
|  | 3 | +title: Grok | 
|  | 4 | +parent: Ingest processors | 
|  | 5 | +grand_parent: Ingest pipelines | 
|  | 6 | +nav_order: 140 | 
|  | 7 | +--- | 
|  | 8 | + | 
|  | 9 | +# Grok  | 
|  | 10 | + | 
|  | 11 | +The `grok` processor is used to parse and structure unstructured data using pattern matching. You can use the `grok` processor to extract fields from log messages, web server access logs, application logs, and other log data that follows a consistent format. | 
|  | 12 | + | 
|  | 13 | +## Grok basics | 
|  | 14 | + | 
|  | 15 | +The `grok` processor uses a set of predefined patterns to match parts of the input text. Each pattern consists of a name and a regular expression. For example, the pattern `%{IP:ip_address}` matches an IP address and assigns it to the field `ip_address`. You can combine multiple patterns to create more complex expressions. For example, the pattern `%{IP:client} %{WORD:method} %{URIPATHPARM:request} %{NUMBER:bytes %NUMBER:duration}` matches a line from a web server access log and extracts the client IP address, the HTTP method, the request URI, the number of bytes sent, and the duration of the request. | 
|  | 16 | + | 
|  | 17 | +The `grok` processor is built on the [Oniguruma regular expression library](https://github.com/kkos/oniguruma/blob/master/doc/RE) and supports all the patterns from that library. You can use the [Grok Debugger](https://grokdebugger.com/) tool to test and debug your grok expressions. | 
|  | 18 | + | 
|  | 19 | +## Grok processor syntax | 
|  | 20 | + | 
|  | 21 | +The following is the basic syntax for the `grok` processor:  | 
|  | 22 | + | 
|  | 23 | +```json | 
|  | 24 | +{ | 
|  | 25 | +  "grok": { | 
|  | 26 | +    "field": "your_message", | 
|  | 27 | +    "patterns": ["your_patterns"] | 
|  | 28 | +  } | 
|  | 29 | +} | 
|  | 30 | +``` | 
|  | 31 | +{% include copy-curl.html %} | 
|  | 32 | + | 
|  | 33 | +## Configuration parameters | 
|  | 34 | + | 
|  | 35 | +To configure the `grok` processor, you have various options that allow you to define patterns, match specific keys, and control the processor's behavior. The following table lists the required and optional parameters for the `grok` processor. | 
|  | 36 | + | 
|  | 37 | +Parameter | Required | Description | | 
|  | 38 | +|-----------|-----------|-----------| | 
|  | 39 | +`field`  | Required  | The name of the field containing the text that should be parsed. | | 
|  | 40 | +`patterns`  | Required  | A list of grok expressions used to match and extract named captures. The first matching expression in the list is returned. |  | 
|  | 41 | +`pattern_definitions`  | Optional  | A dictionary of pattern names and pattern tuples used to define custom patterns for the current processor. If a pattern matches an existing name, it overrides the pre-existing definition. | | 
|  | 42 | +`trace_match` | Optional | When the parameter is set to `true`, the processor adds a field named `_grok_match_index` to the processed document. This field contains the index of the pattern within the `patterns` array that successfully matched the document. This information can be useful for debugging and understanding which pattern was applied to the document. Default is `false`. | | 
|  | 43 | +`description` | Optional | A brief description of the processor. | | 
|  | 44 | +`if` | Optional | A condition for running this processor. | | 
|  | 45 | +`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | | 
|  | 46 | +`ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | | 
|  | 47 | +`on_failure` | Optional | A list of processors to run if the processor fails. | | 
|  | 48 | +`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | | 
|  | 49 | + | 
|  | 50 | +## Creating a pipeline | 
|  | 51 | + | 
|  | 52 | +The following steps guide you through creating an [ingest pipeline]({{site.url}}{{site.baseurl}}/ingest-pipelines/index/) with the `grok` processor.  | 
|  | 53 | + | 
|  | 54 | +**Step 1: Create a pipeline.**  | 
|  | 55 | + | 
|  | 56 | +The following query creates a pipeline, named `log_line`. It extracts fields from the `message` field of the document using the specified pattern. In this case, it extracts the `clientip`, `timestamp`, and `response_status` fields: | 
|  | 57 | + | 
|  | 58 | +```json | 
|  | 59 | +PUT _ingest/pipeline/log_line | 
|  | 60 | +{ | 
|  | 61 | +  "description": "Extract fields from a log line", | 
|  | 62 | +  "processors": [ | 
|  | 63 | +    { | 
|  | 64 | +      "grok": { | 
|  | 65 | +        "field": "message", | 
|  | 66 | +        "patterns": ["%{IPORHOST:clientip} %{HTTPDATE:timestamp} %{NUMBER:response_status:int}"] | 
|  | 67 | +      } | 
|  | 68 | +    } | 
|  | 69 | +  ] | 
|  | 70 | +} | 
|  | 71 | +``` | 
|  | 72 | +{% include copy-curl.html %} | 
|  | 73 | + | 
|  | 74 | +**Step 2 (Optional): Test the pipeline.** | 
|  | 75 | + | 
|  | 76 | +{::nomarkdown}<img src="{{site.url}}{{site.baseurl}}/images/icons/alert-icon.png" class="inline-icon" alt="alert icon"/>{:/} **NOTE**<br>It is recommended that you test your pipeline before you ingest documents. | 
|  | 77 | +{: .note} | 
|  | 78 | + | 
|  | 79 | +To test the pipeline, run the following query: | 
|  | 80 | + | 
|  | 81 | +```json | 
|  | 82 | +POST _ingest/pipeline/log_line/_simulate | 
|  | 83 | +{ | 
|  | 84 | +  "docs": [ | 
|  | 85 | +    { | 
|  | 86 | +      "_source": { | 
|  | 87 | +        "message": "127.0.0.1 198.126.12 10/Oct/2000:13:55:36 -0700 200" | 
|  | 88 | +      } | 
|  | 89 | +    } | 
|  | 90 | +  ] | 
|  | 91 | +} | 
|  | 92 | +``` | 
|  | 93 | +{% include copy-curl.html %} | 
|  | 94 | + | 
|  | 95 | +#### Response | 
|  | 96 | + | 
|  | 97 | +The following response confirms that the pipeline is working as expected: | 
|  | 98 | + | 
|  | 99 | +```json | 
|  | 100 | +{ | 
|  | 101 | +  "docs": [ | 
|  | 102 | +    { | 
|  | 103 | +      "doc": { | 
|  | 104 | +        "_index": "_index", | 
|  | 105 | +        "_id": "_id", | 
|  | 106 | +        "_source": { | 
|  | 107 | +          "message": "127.0.0.1 198.126.12 10/Oct/2000:13:55:36 -0700 200", | 
|  | 108 | +          "response_status": 200, | 
|  | 109 | +          "clientip": "198.126.12", | 
|  | 110 | +          "timestamp": "10/Oct/2000:13:55:36 -0700" | 
|  | 111 | +        }, | 
|  | 112 | +        "_ingest": { | 
|  | 113 | +          "timestamp": "2023-09-13T21:41:52.064540505Z" | 
|  | 114 | +        } | 
|  | 115 | +      } | 
|  | 116 | +    } | 
|  | 117 | +  ] | 
|  | 118 | +} | 
|  | 119 | +``` | 
|  | 120 | + | 
|  | 121 | +**Step 3: Ingest a document.** | 
|  | 122 | + | 
|  | 123 | +The following query ingests a document into an index named `testindex1`: | 
|  | 124 | + | 
|  | 125 | +```json | 
|  | 126 | +PUT testindex1/_doc/1?pipeline=log_line | 
|  | 127 | +{ | 
|  | 128 | +  "message": "127.0.0.1 198.126.12 10/Oct/2000:13:55:36 -0700 200" | 
|  | 129 | +} | 
|  | 130 | +``` | 
|  | 131 | +{% include copy-curl.html %} | 
|  | 132 | + | 
|  | 133 | +**Step 4 (Optional): Retrieve the document.** | 
|  | 134 | + | 
|  | 135 | +To retrieve the document, run the following query: | 
|  | 136 | + | 
|  | 137 | +```json | 
|  | 138 | +GET testindex1/_doc/1 | 
|  | 139 | +``` | 
|  | 140 | +{% include copy-curl.html %} | 
|  | 141 | + | 
|  | 142 | +## Custom patterns | 
|  | 143 | + | 
|  | 144 | +You can use default patterns, or you can add custom patterns to your pipelines using the `patterns_definitions` parameter. Custom grok patterns can be used in a pipeline to extract structured data from log messages that do not match the built-in grok patterns. This can be useful for parsing log messages from custom applications or for parsing log messages that have been modified in some way. Custom patterns adhere to a straightforward structure: each pattern has a unique name and the corresponding regular expression that defines its matching behavior. | 
|  | 145 | + | 
|  | 146 | +The following is an example of how to include a custom pattern in your configuration. In this example, the issue number is between 3 and 4 digits and is parsed into the `issue_number` field and the status is parsed into the `status` field: | 
|  | 147 | + | 
|  | 148 | +```json | 
|  | 149 | +PUT _ingest/pipeline/log_line | 
|  | 150 | +{ | 
|  | 151 | +  "processors": [ | 
|  | 152 | +    { | 
|  | 153 | +      "grok": { | 
|  | 154 | +        "field": "message", | 
|  | 155 | +        "patterns": ["The issue number %{NUMBER:issue_number} is %{STATUS:status}"], | 
|  | 156 | +        "pattern_definitions" : { | 
|  | 157 | +          "NUMBER" : "\\d{3,4}", | 
|  | 158 | +          "STATUS" : "open|closed" | 
|  | 159 | +        } | 
|  | 160 | +      } | 
|  | 161 | +    } | 
|  | 162 | +  ] | 
|  | 163 | +} | 
|  | 164 | +``` | 
|  | 165 | +{% include copy-curl.html %} | 
|  | 166 | + | 
|  | 167 | +## Tracing which patterns matched | 
|  | 168 | + | 
|  | 169 | +To trace which patterns matched and populated the fields, you can use the `trace_match` parameter. The following is an example of how to include this parameter in your configuration: | 
|  | 170 | + | 
|  | 171 | +```json | 
|  | 172 | +PUT _ingest/pipeline/log_line   | 
|  | 173 | +{   | 
|  | 174 | +  "description": "Extract fields from a log line",   | 
|  | 175 | +  "processors": [   | 
|  | 176 | +    {   | 
|  | 177 | +      "grok": {   | 
|  | 178 | +        "field": "message",   | 
|  | 179 | +        "patterns": ["%{HTTPDATE:timestamp} %{IPORHOST:clientip}", "%{IPORHOST:clientip} %{HTTPDATE:timestamp} %{NUMBER:response_status:int}"],   | 
|  | 180 | +        "trace_match": true   | 
|  | 181 | +      }   | 
|  | 182 | +    }   | 
|  | 183 | +  ]   | 
|  | 184 | +}   | 
|  | 185 | +``` | 
|  | 186 | +{% include copy-curl.html %} | 
|  | 187 | + | 
|  | 188 | +When you simulate the pipeline, OpenSearch returns the `_ingest` metadata that includes the `grok_match_index`, as shown in the following output: | 
|  | 189 | + | 
|  | 190 | +```json | 
|  | 191 | +{ | 
|  | 192 | +  "docs": [ | 
|  | 193 | +    { | 
|  | 194 | +      "doc": { | 
|  | 195 | +        "_index": "_index", | 
|  | 196 | +        "_id": "_id", | 
|  | 197 | +        "_source": { | 
|  | 198 | +          "message": "127.0.0.1 198.126.12 10/Oct/2000:13:55:36 -0700 200", | 
|  | 199 | +          "response_status": 200, | 
|  | 200 | +          "clientip": "198.126.12", | 
|  | 201 | +          "timestamp": "10/Oct/2000:13:55:36 -0700" | 
|  | 202 | +        }, | 
|  | 203 | +        "_ingest": { | 
|  | 204 | +          "_grok_match_index": "1", | 
|  | 205 | +          "timestamp": "2023-11-02T18:48:40.455619084Z" | 
|  | 206 | +        } | 
|  | 207 | +      } | 
|  | 208 | +    } | 
|  | 209 | +  ] | 
|  | 210 | +} | 
|  | 211 | +``` | 
|  | 212 | + | 
0 commit comments