-
Notifications
You must be signed in to change notification settings - Fork 1.8k
add cargo.lock file #982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cargo.lock file #982
Conversation
|
My understanding of I did not place Python bindings as a member of the workspace because a workspace by definition shares the So, I think the resolution here is to generate a |
|
@jorgecarleitao yes, I noticed the python binding was not part of the workspace later, so I have generated another lock file inside the python folder as well. The root level lockfile is for ballista, not for datafusion. |
|
Nice idea @houqp ! Some thoughts: One concern I have with using it for a library is that users of Next to that, do we have any users actually using the source release (e.g. with included Cargo.lock file), instead of only using On the other hand, it still might have benefit if someone wants to do an (automated) build for distributing the binaries. But I am not sure if currently that's done somewhere? Also, before we go ahead with this, I think it makes sense to have a plan / document how to update the cargo lock file. Some options I can think of:
|
|
Good points @Dandandan. Having a lock file for datafusion definitely prevents us from catching build errors from latest dependency releases. I just took a look at the cargo project itself, it's interesting that even though they release binaries, they don't check-in a Cargo.lock file. I am going to do more research on this topic.
I am certainly not expecting This is mainly for the python binding and ballista binaries. The problem right now is ballista and datafusion are in the same workspace, so they are sharing the same lock file :( But now that I took a second look at ballista crates, all of them including the scheduler and the executor are expected to be used as upstream dependencies in the client crate as libraries. So we should hold on adding lock file for ballista crates for now until we move its binaries into their own crates. The lack of access to reproducible builds from the same source tree has impact to our development and release process too. For example, random build failures like #961 could have been avoided with a lock file. In that case, our CI broke without any code change. Another potential issue it could address is to make sure the final binary wheels we publish to pypi are built with the exact same set of dependencies that are used in our automated and manual tests. The test run and wheel release build could happen at very different times. In short, I propose we keep the lock file to the python binding for now.
Definitely something good to document in the developer doc. I think on demand update would make the most sense because cargo.lock file would change if a new commit adds a new dependency, the lock file should be updated in the same commit, otherwise the build won't be reproducible anymore. I also found it useful to use lock file diff to look for red flags on dependency bloats. |
|
FWIW we use Using Cargo.lock to avoid situations where upstream library changes break DataFusion CI like #968 sounds appealing, but then we may just be kicking the problems downstream (and farther from the root of the problem) to the users of DataFusion. What if we removed |
|
@jorgecarleitao @Dandandan @alamb I have updated to PR to only add the lock file for the python binding crate. I think it's better for us to skip datafusion-cli and balllista scheduler/executor for now because we are not releasing any official binaries for them. We can come back to move them out of the workspace when we are ready to maintain official binaries for them. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thanks @houqp !
jorgecarleitao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, @houqp !
|
Looks good @houqp , one minor comment |
|
🎉 |
* feat: reads using global ctx * Add text to io methods to describe the context they are using --------- Co-authored-by: Tim Saucer <timsaucer@gmail.com>
Which issue does this PR close?
closes #968
Rationale for this change
A Cargo.lock will be useful for projects like datafusion python binding and ballista since they both produce binaries. Right now, our source releases are not reproducible for these two subprojects because we don't ship a lock file to pin dependencies to exact versions. This means if users use our official source release to build binaries, their output could be different than the ones we tested when prepping for the release.
What changes are included in this PR?
Added auto generated lock file.
Are there any user-facing changes?
No