You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tell us about your request. Provide a summary of the request.
Someone was asking about whether data prepper can "handle" apache avro data, and found that the documentation wasn't entirely clear. avro is listed as a codec for data prepper, but refers to it "most efficiently being used" in an S3 sink. Could we add a paragraph or so about how it can be used outside of an S3 sink?
Also - it seems to have some weird formatting oddities that make it a little hard to skim. See screenshots.
*Version: List the OpenSearch version to which this issue applies, e.g. 2.14, 2.12--2.14, or all.
2.15
What other resources are available? Provide links to related issues, POCs, steps for testing, etc.
The text was updated successfully, but these errors were encountered:
Regarding the original question, Data Prepper can read Avro from S3 and write Avro to S3.
Regarding the documentation, we should revisit this page. The original intention was to clarify when a user should use a codec versus a processor for parsing input data.
I might reword this as:
Apache Avro is an open-source serialization format for record data. When reading Avro data you should use the avro codec.
I also noticed some question comments about Parquet.
Apache Parquet is a columnar storage format built for Hadoop. It is most efficient without the use of a codec. Positive results, however, can be achieved when it’s configured with S3 Select.
Perhaps this should say:
Apache Parquet is a columnar storage format built for Hadoop. Pipeline authors can use the parquet codec to read Parquet data directly from the S3 object. This will retrieve all data from Parquet. An alternative is to use S3 Select instead of the codec. In this case, S3 Select will parse the Parquet file directly (additional S3 charges apply). This can be more efficient if you are filtering or loading a subset of data.
What do you want to do?
Tell us about your request. Provide a summary of the request.
Someone was asking about whether data prepper can "handle" apache avro data, and found that the documentation wasn't entirely clear.
avro
is listed as a codec for data prepper, but refers to it "most efficiently being used" in an S3 sink. Could we add a paragraph or so about how it can be used outside of an S3 sink?Also - it seems to have some weird formatting oddities that make it a little hard to skim. See screenshots.
*Version: List the OpenSearch version to which this issue applies, e.g. 2.14, 2.12--2.14, or all.
2.15
What other resources are available? Provide links to related issues, POCs, steps for testing, etc.
The text was updated successfully, but these errors were encountered: