-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[filebeat][awss3] - Added support for parquet decoding and decoder config #35578
Conversation
… in awss3 package
@andrewkroh wrt the questions you had -
For the current implementation, since its a decoder that is specific to the input, these design decisions were not made. |
@andrewkroh I've refactored the decoder to be more generic, updated the tests and docs and added necessary comments. For now I've kept the implementation simple and very basic, suited for the current use case but this can be easily extended in future. I've not added a no-op decoder here since I need the nil value returned to branch on the legacy path. I've tried to make as little modifications as possible to the legacy logic so that complications are avoided. |
@andrewkroh could you review the current implementation while we wait for the CI pipeline to be fixed, so that we can merge before FF, that would be really great as Crest will be taking up the security lake integration after this merge. I opened a public issue: apache/arrow#36052 for the cross build errors on 32bit systems and they have been resolved with a recent PR, but still not updating the library since there is no stable release out with these fixes. |
Build fix is in review: #35789 |
Hello everyone, I'm facing an issue while trying to retrieve a Parquet file from S3 using Filebeat. Below, I've included configuration details:
I have tried various bucket_list_prefix solutions including:
However, we consistently encounter the following error:
Any insights or suggestions on troubleshooting steps would be highly appreciated. Please let me know if additional information is needed. Thank you |
Trying using https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-aws-s3.html#_file_selectors |
Thanks for the suggestion on file_selectors. I was using them and did not realize my issue until seeing this last comment and moving the decoding block into file_selector section solved my problem. |
Type of change
What does this PR do?
This PR adds support for a new decoding config option inside the readerConfig struct along with support for
parquet file decoding using the libbeat parquet reader. The decoding config is created in such a manner that
in future we will be able to add more decoding codes as well as migrate decoding processes for JSON, NDJSON
files which currently occur based on the contentType config option.
An example of the new decoding config:
Why is it important?
This change allows us to officially support parquet decoding for the s3 input and also enable integrations like
amazon security lake.
Checklist
- [ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
.Related issues