-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-35373][BUILD] Check Maven artifact checksum in build/mvn #32505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e the change less complex
local remote_tarball="$1" | ||
local local_tarball="${_DIR}/$2" | ||
local binary="${_DIR}/$3" | ||
local mirror_host="$1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to break down the arguments further to make the logic work below. As part of that I tried to refactor this method a bit, which was getting hard to understand.
local binary="${_DIR}/$6" | ||
local remote_tarball="${mirror_host}/${url_path}${url_query}" | ||
local local_checksum="${local_tarball}.${checksum_suffix}" | ||
local remote_checksum="https://archive.apache.org/dist/${url_path}.${checksum_suffix}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more point : the checksums are not mirrored (well, not consistently). But we very much want to check only the ASF's version of the checksum. If an attacker modified a mirror's artifact, they could just modify the checksum too of course.
Kubernetes integration test starting |
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
In case of the mentioned out-of-sync cases between Apache main and mirrors, we will hit some random Maven jobs failures after this PR. It's rare and hopefully it will be recovered automatically in the subsequent runs eventually.
That much I'm not worried about, as this only concerns Maven distributions, and any version we used would be long since published and mirrored, nothing released days ago. (Yes we already face the problem of offline mirrors, but fallback to archive.apache.org anyway) |
Test build #138384 has finished for PR 32505 at commit
|
I may merge this to master first to see if there are issues, then back-port it if it seems OK. We probably do want this as part of maintenance branches. |
+1 for your decision, @srowen . Ya. Let's merge to master first and monitor it. |
Merged to master. I'll back port in a while if there are no issues. |
### What changes were proposed in this pull request? `./build/mvn` now downloads the .sha512 checksum of Maven artifacts it downloads, and checks the checksum after download. ### Why are the changes needed? This ensures the integrity of the Maven artifact during a user's build, which may come from several non-ASF mirrors. ### Does this PR introduce _any_ user-facing change? Should not affect anything about Spark per se, just the build. ### How was this patch tested? Manual testing wherein I forced Maven/Scala download, verified checksums are downloaded and checked, and verified it fails on error with a corrupted checksum. Closes #32505 from srowen/SPARK-35373. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request? `./build/mvn` now downloads the .sha512 checksum of Maven artifacts it downloads, and checks the checksum after download. ### Why are the changes needed? This ensures the integrity of the Maven artifact during a user's build, which may come from several non-ASF mirrors. ### Does this PR introduce _any_ user-facing change? Should not affect anything about Spark per se, just the build. ### How was this patch tested? Manual testing wherein I forced Maven/Scala download, verified checksums are downloaded and checked, and verified it fails on error with a corrupted checksum. Closes #32505 from srowen/SPARK-35373. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>
Backported to 3.1 / 3.0. There were merge conflicts in 2.4 ; maybe not worth backporting there as it's EOL |
Oh, @srowen .
|
It seems that the root cause is |
I made a follow-up. |
### What changes were proposed in this pull request? This PR is a follow-up of #32505 to fix `zinc` installation. ### Why are the changes needed? Currently, branch-3.1/3.0 is broken. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GitHub Action. Closes #32591 from dongjoon-hyun/SPARK-35373. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? This PR is a follow-up of #32505 to fix `zinc` installation. ### Why are the changes needed? Currently, branch-3.1/3.0 is broken. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GitHub Action. Closes #32591 from dongjoon-hyun/SPARK-35373. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 50edf12) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
rm -rf "$local_tarball" | ||
fi | ||
# Checksum may not have been specified; don't check if doesn't exist | ||
if [ -f "${local_checksum}" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to add $(command -v shasum)
here to check if shasum
exists or not (then, raise an error with an explicit message)? For example, Amazon Linux2 does not have it by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh really? shoot. That seemed to be the one command that was on Linux and OS X. Well, we can fail outright if not present, which would stop the build. Is that better or worse than just silently letting it proceed without verification? I don't know. Would developers generally build on this distro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh really? shoot. That seemed to be the one command that was on Linux and OS X.
Yea, for Amazon Linux2, we need to run sudo yum install perl-Digest-SHA
before running build/mvn
.
Is that better or worse than just silently letting it proceed without verification?
IMO, since this check exists to prevent cyber attacks, it is safer to makebuild/mvn
fail if shasum
does not exists (actually, it seems the other OSs have shasum
by default, so most developers don't care, I think).
Would developers generally build on this distro?
Not sure, but that's the default option in aws ec2. I noticed this issue because my daily build broke after this commit merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I can submit a follow-up to fail if shasum is not present. That might inconvenience some developers, but, this is a security issue, and of course developers are welcome to install mvn / scala directly if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
## What changes were proposed in this pull request? Use ` > /dev/null` to replace `-q` in shasum validation. ### Why are the changes needed? In PR #32505 , added the shasum check on maven. The `shasum -a 512 -q -c xxx.sha` is used to validate checksum, the `-q` args is for "don't print OK for each successfully verified file", but `-q` arg is introduce in shasum 6.x version. So we got the `Unknown option: q`. ``` ➜ ~ uname -a Darwin MacBook.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Apr 12 20:57:45 PDT 2021; root:xnu-6153.141.28.1~1/RELEASE_X86_64 x86_64 ➜ ~ shasum -v 5.84 ➜ ~ shasum -q Unknown option: q Type shasum -h for help ``` it makes ARM CI failed: [1] https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/ ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `shasum -a 512 -c wrong.sha > /dev/null` return code 1 without print `shasum -a 512 -c right.sha > /dev/null` return code 0 without print e2e test: ``` rm -f build/apache-maven-3.6.3-bin.tar.gz rm -r build/apache-maven-3.6.3-bin mvn -v ``` Closes #32604 from Yikun/patch-5. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>
## What changes were proposed in this pull request? Use ` > /dev/null` to replace `-q` in shasum validation. ### Why are the changes needed? In PR #32505 , added the shasum check on maven. The `shasum -a 512 -q -c xxx.sha` is used to validate checksum, the `-q` args is for "don't print OK for each successfully verified file", but `-q` arg is introduce in shasum 6.x version. So we got the `Unknown option: q`. ``` ➜ ~ uname -a Darwin MacBook.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Apr 12 20:57:45 PDT 2021; root:xnu-6153.141.28.1~1/RELEASE_X86_64 x86_64 ➜ ~ shasum -v 5.84 ➜ ~ shasum -q Unknown option: q Type shasum -h for help ``` it makes ARM CI failed: [1] https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/ ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `shasum -a 512 -c wrong.sha > /dev/null` return code 1 without print `shasum -a 512 -c right.sha > /dev/null` return code 0 without print e2e test: ``` rm -f build/apache-maven-3.6.3-bin.tar.gz rm -r build/apache-maven-3.6.3-bin mvn -v ``` Closes #32604 from Yikun/patch-5. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit 38fbc0b) Signed-off-by: Sean Owen <srowen@gmail.com>
## What changes were proposed in this pull request? Use ` > /dev/null` to replace `-q` in shasum validation. ### Why are the changes needed? In PR #32505 , added the shasum check on maven. The `shasum -a 512 -q -c xxx.sha` is used to validate checksum, the `-q` args is for "don't print OK for each successfully verified file", but `-q` arg is introduce in shasum 6.x version. So we got the `Unknown option: q`. ``` ➜ ~ uname -a Darwin MacBook.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Apr 12 20:57:45 PDT 2021; root:xnu-6153.141.28.1~1/RELEASE_X86_64 x86_64 ➜ ~ shasum -v 5.84 ➜ ~ shasum -q Unknown option: q Type shasum -h for help ``` it makes ARM CI failed: [1] https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/ ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `shasum -a 512 -c wrong.sha > /dev/null` return code 1 without print `shasum -a 512 -c right.sha > /dev/null` return code 0 without print e2e test: ``` rm -f build/apache-maven-3.6.3-bin.tar.gz rm -r build/apache-maven-3.6.3-bin mvn -v ``` Closes #32604 from Yikun/patch-5. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit 38fbc0b) Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request? `./build/mvn` now downloads the .sha512 checksum of Maven artifacts it downloads, and checks the checksum after download. ### Why are the changes needed? This ensures the integrity of the Maven artifact during a user's build, which may come from several non-ASF mirrors. ### Does this PR introduce _any_ user-facing change? Should not affect anything about Spark per se, just the build. ### How was this patch tested? Manual testing wherein I forced Maven/Scala download, verified checksums are downloaded and checked, and verified it fails on error with a corrupted checksum. Closes apache#32505 from srowen/SPARK-35373. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request? This PR is a follow-up of apache#32505 to fix `zinc` installation. ### Why are the changes needed? Currently, branch-3.1/3.0 is broken. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GitHub Action. Closes apache#32591 from dongjoon-hyun/SPARK-35373. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? `./build/mvn` now downloads the .sha512 checksum of Maven artifacts it downloads, and checks the checksum after download. ### Why are the changes needed? This ensures the integrity of the Maven artifact during a user's build, which may come from several non-ASF mirrors. ### Does this PR introduce _any_ user-facing change? Should not affect anything about Spark per se, just the build. ### How was this patch tested? Manual testing wherein I forced Maven/Scala download, verified checksums are downloaded and checked, and verified it fails on error with a corrupted checksum. Closes apache#32505 from srowen/SPARK-35373. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request? This PR is a follow-up of apache#32505 to fix `zinc` installation. ### Why are the changes needed? Currently, branch-3.1/3.0 is broken. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GitHub Action. Closes apache#32591 from dongjoon-hyun/SPARK-35373. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request? Use ` > /dev/null` to replace `-q` in shasum validation. ### Why are the changes needed? In PR apache#32505 , added the shasum check on maven. The `shasum -a 512 -q -c xxx.sha` is used to validate checksum, the `-q` args is for "don't print OK for each successfully verified file", but `-q` arg is introduce in shasum 6.x version. So we got the `Unknown option: q`. ``` ➜ ~ uname -a Darwin MacBook.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Apr 12 20:57:45 PDT 2021; root:xnu-6153.141.28.1~1/RELEASE_X86_64 x86_64 ➜ ~ shasum -v 5.84 ➜ ~ shasum -q Unknown option: q Type shasum -h for help ``` it makes ARM CI failed: [1] https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/ ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `shasum -a 512 -c wrong.sha > /dev/null` return code 1 without print `shasum -a 512 -c right.sha > /dev/null` return code 0 without print e2e test: ``` rm -f build/apache-maven-3.6.3-bin.tar.gz rm -r build/apache-maven-3.6.3-bin mvn -v ``` Closes apache#32604 from Yikun/patch-5. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit 38fbc0b) Signed-off-by: Sean Owen <srowen@gmail.com>
What changes were proposed in this pull request?
./build/mvn
now downloads the .sha512 checksum of Maven artifacts it downloads, and checks the checksum after download.Why are the changes needed?
This ensures the integrity of the Maven artifact during a user's build, which may come from several non-ASF mirrors.
Does this PR introduce any user-facing change?
Should not affect anything about Spark per se, just the build.
How was this patch tested?
Manual testing wherein I forced Maven/Scala download, verified checksums are downloaded and checked, and verified it fails on error with a corrupted checksum.