-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vendor Generated Protobuf Code (#3947) #3950
Vendor Generated Protobuf Code (#3947) #3950
Conversation
@avantgardnerio What do you think about this? |
@@ -129,3 +129,5 @@ Cargo.lock | |||
.history | |||
parquet-testing/* | |||
*rat.txt | |||
datafusion/proto/src/generated/pbjson.rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is precedent in arrow for excluding generated files - https://github.com/apache/arrow-rs/blob/master/dev/release/rat_exclude_files.txt#L20
@tustvold does it make sense to also include a GH action to ensure that PRs that actually touch the source also generate the latest version of the declarations? Something like the following: |
My main concern is about making forward progress for the project, and I think not having our dependencies install My secondary concern is for general project maintainability, and for that I think there's issues either way: checked in pros:
checked in cons:
generated pros:
generated cons:
I must highlight though that I am concerned this does not resolve #3538 as that appears to be working as per my comment there. |
I am ok with either approach, but I think I would prefer to check in the generated sources rather than have build.rs download protoc. It seems like this could be a security risk and also adds complexity. I don't want to try and support a user who hits some issue with protoc not working on their system, I am possibly just trying to avoid extra work here. 😅
We can configure rustfmt to ignore the generated files (https://rust-lang.github.io/rustfmt/?version=v1.5.1&search=#ignore). @tustvold would you be ok with adding that to this PR?
We should have CI prevent that from happening. Would be good to include that in this PR if possible.
This is bound to happen, but the files are in a |
I'd approve this PR if:
|
Can we provide a Dockerfile with protoc for generating the code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like everyone is leaning towards this solution. As stated above, I think the main concern is that dependents don't have to install protoc, and this PR achieves that goal, so I'm approving and closing mine.
Already in place
This is still automatically run as part of build.rs, I will add some docs though
Protoc is only used to generate a descriptor set, i.e. parse the proto files, not the generated Rust code which is performed by prost-build. The output should therefore be relatively stable.
I suspect dependencies by git sha may still need protoc installed - as this is effectively a source dependency I think this is fine
I will add a CI check to here and arrow-rs |
@@ -81,16 +81,6 @@ jobs: | |||
- uses: actions/checkout@v3 | |||
with: | |||
submodules: true | |||
- name: Install protobuf compiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to be necessary anymore, the version in the system package manager is recent enough (only 2 years out of date) 😆
$ brew install protobuf | ||
``` | ||
|
||
You will want to verify the version installed is `3.12` or greater, which introduced support for explicit [field presence](https://github.com/protocolbuffers/protobuf/blob/v3.12.0/docs/field_presence.md). Older versions may fail to compile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFusion doesn't actually use this feature, for good reason imo, but this is the only major functionality change to the core protoc functionality that may cause compatibility issues for us. More importantly 3.12 is the version currently used in CI
…oc-for-source-checkout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @tustvold and @avantgardnerio
Benchmark runs are scheduled for baseline = fa5cd7f and contender = 11c5255. 11c5255 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
* Vendor generated protobuf code (apache#3947) * RAT * Fix build without json * Review feedback * Doc tweak * Fix Arch install instructions
Which issue does this PR close?
Closes #3538
Closes #3947
Rationale for this change
We don't want to require downstreams to install PROTOC, but we also don't want to allow the generated code to get out of sync. This is the approach we use with arrow-rs, and it appears to work fairly well. Specifically we don't bundle the .proto files in the released crate, and only generate the code if the .proto files exist.
What changes are included in this PR?
Are there any user-facing changes?
datafusion-proto now always has a build dependency on pbjson-build. This is a fairly lightweight dependency and makes the logic significantly easier to follow.