Skip to content

Commit 2b61df2

Browse files
vagimelinatebower
andauthored
[DOC] Add gsub processor documentation (#5983)
* Add gsub processor documentation Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Add request examples and explantory text Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Add parameters Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update gsub.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update gsub.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update gsub.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update gsub.md Signed-off-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _ingest-pipelines/processors/gsub.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _ingest-pipelines/processors/gsub.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _ingest-pipelines/processors/gsub.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _ingest-pipelines/processors/gsub.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Update _ingest-pipelines/processors/gsub.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Melissa Vagi <vagimeli@amazon.com> --------- Signed-off-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
1 parent 7dd0961 commit 2b61df2

File tree

1 file changed

+170
-0
lines changed
  • _ingest-pipelines/processors

1 file changed

+170
-0
lines changed
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
---
2+
layout: default
3+
title: gsub
4+
parent: Ingest processors
5+
nav_order: 130
6+
---
7+
8+
# Gsub processor
9+
10+
The `gsub` processor performs a regular expression search-and-replace operation on string fields in incoming documents. If the field contains an array of strings, the operation is applied to all elements in the array. However, if the field contains non-string values, the processor throws an exception. Use cases for the `gsub` processor include removing sensitive information from log messages or user-generated content, normalizing data formats or conventions (for example, converting date formats, removing special characters), and extracting or transforming substrings from field values for further processing or analysis.
11+
12+
The following is the syntax for the `gsub` processor:
13+
14+
```json
15+
"gsub": {
16+
"field": "field_name",
17+
"pattern": "regex_pattern",
18+
"replacement": "replacement_string"
19+
}
20+
```
21+
{% include copy-curl.html %}
22+
23+
## Configuration parameters
24+
25+
The following table lists the required and optional parameters for the `gsub` processor.
26+
27+
Parameter | Required/Optional | Description |
28+
|-----------|-----------|-----------|
29+
`field` | Required | The field to apply the replacement to.
30+
`pattern` | Required | The pattern to be replaced.
31+
`replacement` | Required | The string that will replace the matching patterns.
32+
`target_field` | Optional | The name of the field in which to store the parsed data. If `target_field` is not specified, the parsed data replaces the original data in the `field` field. Default is `field`.
33+
`if` | Optional | A condition for running the processor.
34+
`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified field. Default is `false`.
35+
`ignore_failure` | Optional | Specifies whether the processor continues execution even if it encounters an error. If set to `true`, then failures are ignored. Default is `false`.
36+
`on_failure` | Optional | A list of processors to run if the processor fails.
37+
`tag` | Optional | An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type.
38+
39+
## Using the processor
40+
41+
Follow these steps to use the processor in a pipeline.
42+
43+
### Step 1: Create a pipeline
44+
45+
The following query creates a pipeline named `gsub_pipeline` that uses the `gsub` processor to replace all occurrences of the word `error` with the word `warning` in the `message` field:
46+
47+
```json
48+
PUT _ingest/pipeline/gsub_pipeline
49+
{
50+
"description": "Replaces 'error' with 'warning' in the 'message' field",
51+
"processors": [
52+
{
53+
"gsub": {
54+
"field": "message",
55+
"pattern": "error",
56+
"replacement": "warning"
57+
}
58+
}
59+
]
60+
}
61+
```
62+
{% include copy-curl.html %}
63+
64+
### Step 2 (Optional): Test the pipeline
65+
66+
It is recommended that you test your pipeline before you ingest documents.
67+
{: .tip}
68+
69+
To test the pipeline, run the following query:
70+
71+
```json
72+
POST _ingest/pipeline/gsub_pipeline/_simulate
73+
{
74+
"docs": [
75+
{
76+
"_source": {
77+
"message": "This is an error message"
78+
}
79+
}
80+
]
81+
}
82+
```
83+
{% include copy-curl.html %}
84+
85+
#### Response
86+
87+
The following response confirms that the pipeline is working as expected:
88+
89+
```json
90+
{
91+
"docs": [
92+
{
93+
"doc": {
94+
"_index": "_index",
95+
"_id": "_id",
96+
"_source": {
97+
"message": "This is an warning message"
98+
},
99+
"_ingest": {
100+
"timestamp": "2024-05-22T19:47:00.645687211Z"
101+
}
102+
}
103+
}
104+
]
105+
}
106+
```
107+
{% include copy-curl.html %}
108+
109+
### Step 3: Ingest a document
110+
111+
The following query ingests a document into an index named `logs`:
112+
113+
```json
114+
PUT logs/_doc/1?pipeline=gsub_pipeline
115+
{
116+
"message": "This is an error message"
117+
}
118+
```
119+
{% include copy-curl.html %}
120+
121+
#### Response
122+
123+
The following response shows that the request indexed the document into the index named `logs` and that the `gsub` processor replaced all occurrences of the word `error` with the word `warning` in the `message` field:
124+
125+
```json
126+
{
127+
"_index": "logs",
128+
"_id": "1",
129+
"_version": 1,
130+
"result": "created",
131+
"_shards": {
132+
"total": 2,
133+
"successful": 1,
134+
"failed": 0
135+
},
136+
"_seq_no": 0,
137+
"_primary_term": 1
138+
}
139+
```
140+
{% include copy-curl.html %}
141+
142+
### Step 4 (Optional): Retrieve the document
143+
144+
To retrieve the document, run the following query:
145+
146+
```json
147+
GET logs/_doc/1
148+
```
149+
{% include copy-curl.html %}
150+
151+
#### Response
152+
153+
The following response shows the document with the modified `message` field value:
154+
155+
```json
156+
{
157+
"_index": "logs",
158+
"_id": "1",
159+
"_version": 1,
160+
"_seq_no": 0,
161+
"_primary_term": 1,
162+
"found": true,
163+
"_source": {
164+
"message": "This is an warning message"
165+
}
166+
}
167+
```
168+
{% include copy-curl.html %}
169+
170+

0 commit comments

Comments
 (0)