You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make sure that Main picks up all change from Data Prepper 2.0 branch (#1517)
* Change Data Prepper intro
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
* Add next steps section
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
* Add David's feedback
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
* Fix optional tags
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
* Address small typo
* [Data Prepper 2.0]MAINT: documentation change regarding record type (#1306)
* MAINT: documentation change regarding record type
Signed-off-by: Chen <qchea@amazon.com>
* MAINT: documentation on trace group fields
Signed-off-by: Chen <qchea@amazon.com>
Signed-off-by: Chen <qchea@amazon.com>
* Update docs for Data Prepper 2.0 (#1404)
* Update get-started
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Update pipelines.md
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Add peer forwarder options to references
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Add csv processor options to refereces
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Add docs for conditional routing
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Add docs for json processor
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Remove docs for peer forwarder plugin
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Address review feedback - revise sentences, fix inaccurate info and typos
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Add missing options for http source and peer forwarder
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Update ssl options on peer forwarder
Signed-off-by: Hai Yan <oeyh@amazon.com>
Signed-off-by: Hai Yan <oeyh@amazon.com>
* More updates for Data Prepper 2.0 (#1469)
* Update http source and opensearch sink options
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Update docker run command
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Add more missing options
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Add metadata_root_key for s3 source
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Address review comments - tweak sentences and fix typos
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Address review comments
Signed-off-by: Hai Yan <oeyh@amazon.com>
Signed-off-by: Hai Yan <oeyh@amazon.com>
* Fix broken link
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
* Add reworked Getting Started page
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
* Undo change for getting started
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
* Add edited Data Prepper overview
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
* Add edited Data Prepper file
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
Signed-off-by: Chen <qchea@amazon.com>
Signed-off-by: Hai Yan <oeyh@amazon.com>
Co-authored-by: Qi Chen <qchea@amazon.com>
Co-authored-by: Hai Yan <8153134+oeyh@users.noreply.github.com>
Copy file name to clipboardExpand all lines: _clients/data-prepper/index.md
+10-9Lines changed: 10 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ Data Prepper lets users build custom pipelines to improve the operational view o
14
14
15
15
## Concepts
16
16
17
-
Data Prepper is compromised of **Pipelines** that collect and filter data based on the components set within the pipeline. Each component is pluggable, enabling you to use your own custom implementation of each component. These components include:
17
+
Data Prepper is compromised of one or more **Pipelines** that collect and filter data based on the components set within the pipeline. Each component is pluggable, enabling you to use your own custom implementation of each component. These components include the following:
18
18
19
19
- One [source](#source)
20
20
- One or more[sinks](#sink)
@@ -23,31 +23,31 @@ Data Prepper is compromised of **Pipelines** that collect and filter data based
23
23
24
24
A single instance of Data Prepper can have one or more pipelines.
25
25
26
-
Each pipeline definition contains two required components **source** and **sink**. If buffers and processors are missing from the Data Prepper pipeline, Data Prepper uses the default buffer and a no-op processor.
26
+
Each pipeline definition contains two required components:**source** and **sink**. If buffers and processors are missing from the Data Prepper pipeline, Data Prepper uses the default buffer and a no-op processor.
27
27
28
28
### Source
29
29
30
-
Source is the input component of a pipeline that defines the mechanism through which a Data Prepper pipeline will consume events. A pipeline can have only one source. The source can consume events either by receiving the events over HTTP or HTTPS or reading from external endpoints like OTeL Collector for traces and metrics and S3. Source have their own configuration options based on the format of the events (such as string, json, cloudwatch logs, or open telemetry trace). The source component consumes events and writes them to the buffer component.
30
+
Source is the input component that defines the mechanism through which a Data Prepper pipeline will consume events. A pipeline can have only one source. The source can consume events either by receiving the events over HTTP or HTTPS or by reading from external endpoints like OTeL Collector for traces and metrics and Amazon Simple Storage Service (Amazon S3). Sources have their own configuration options based on the format of the events (such as string, JSON, Amazon CloudWatch logs, or open telemetry trace). The source component consumes events and writes them to the buffer component.
31
31
32
32
### Buffer
33
33
34
-
The buffer component acts as the layer between the source and the sink. Buffer can be either in-memory or disk-based. The default buffer uses an in-memory queue bounded by the number of events, called `bounded_blocking`. If the buffer component is not explicitly mentioned in the pipeline configuration, Data Prepper uses the default `bounded_blocking`.
34
+
The buffer component acts as the layer between the source and the sink. Buffer can be either in-memory or diskbased. The default buffer uses an in-memory queue called `bounded_blocking` that is bounded by the number of events. If the buffer component is not explicitly mentioned in the pipeline configuration, Data Prepper uses the default `bounded_blocking`.
35
35
36
36
### Sink
37
37
38
-
Sink is the output component of a pipeline that defines the destination(s) to which a Data Prepper pipeline publishes events. A sink destination could be services such as OpenSearch, S3, or another Data Prepper pipeline. When using another Data Prepper pipeline as the sink, you can chain multiple pipelines together based on the needs to the data. Sink contains it's own configurations options based on the destination type.
38
+
Sink is the output component that defines the destination(s) to which a Data Prepper pipeline publishes events. A sink destination could be a service, such as OpenSearch or Amazon S3, or another Data Prepper pipeline. When using another Data Prepper pipeline as the sink, you can chain multiple pipelines together based on the needs of the data. Sink contains its own configuration options based on the destination type.
39
39
40
40
### Processor
41
41
42
-
Processors are units within the Data Prepper pipeline that can filter, transform, and enrich events into your desired format before publishing the record to the sink. The a processor is not defined in the pipeline configuration, the events publish in the format defined in the source component. You can have more than on processor within a pipeline. When using multiple processors, the processors are executed in the order they are defined inside the pipeline spec.
42
+
Processors are units within the Data Prepper pipeline that can filter, transform, and enrich events using your desired format before publishing the record to the sink component. The processor is not defined in the pipeline configuration; the events publish in the format defined in the source component. You can have more than one processor within a pipeline. When using multiple processors, the processors are run in the order they are defined inside the pipeline specification.
43
43
44
-
## Sample Pipeline configurations
44
+
## Sample pipeline configurations
45
45
46
46
To understand how all pipeline components function within a Data Prepper configuration, see the following examples. Each pipeline configuration uses a `yaml` file format.
47
47
48
48
### Minimal component
49
49
50
-
This pipeline configuration reads from file source and writes to that same source. It uses the default options for buffer and processor.
50
+
This pipeline configuration reads from the file source and writes to that same source. It uses the default options for the buffer and processor.
51
51
52
52
```yml
53
53
sample-pipeline:
@@ -61,7 +61,7 @@ sample-pipeline:
61
61
62
62
### All components
63
63
64
-
The following pipeline uses a source that reads string events from the `input-file`. The source then pushes the data to buffer bounded by max size of `1024`. The pipeline configured to have `4` workers each of them reading maximum of `256` events from the buffer for every `100 milliseconds`. Each worker executes the `string_converter` processor and write the output of the processor to the `output-file`.
64
+
The following pipeline uses a source that reads string events from the `input-file`. The source then pushes the data to the buffer, bounded by a max size of `1024`. The pipeline is configured to have `4` workers, each of them reading a maximum of `256` events from the buffer for every `100 milliseconds`. Each worker runs the `string_converter` processor and writes the output of the processor to the `output-file`.
65
65
66
66
```yml
67
67
sample-pipeline:
@@ -85,3 +85,4 @@ sample-pipeline:
85
85
## Next steps
86
86
87
87
To get started building your own custom pipelines with Data Prepper, see the [Get Started]({{site.url}}{{site.baseurl}}/clients/data-prepper/get-started/) guide.
0 commit comments