-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(outputs.parquet): Introduce Parquet output #15602
Conversation
a2a428d
to
a0c7b38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary review on the readme, looking over the code now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks really good @powersj! Just some small comments from my side...
* extra word typo in readme * initialize timestamp field in init(), update tests * always use string arrow type for tags
62bc2e1
to
98399b6
Compare
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @powersj! Code looks fantastic!
Summary
Introduces a new output to write metrics in Parquet format. This groups metrics by metric name and writes them to files. We need to know the schema to write beforehand, the first time we encounter a metric name we generate the schema based on the metric we found then use the buffered file writer from here on out to write files as efficiently as possible.
We must close the file correctly otherwise the file will not be a valid parquet file. As such I have avoided creating files by template name due to complications with keeping too many files open. Instead, I have a time-based rotation option that will close the existing file and create a new file.
Check out this blog post for an overview of Parquet files: https://www.influxdata.com/blog/how-good-parquet-wide-tables/
Checklist
Related issues
fixes: #14786