Skip to content

[SPARK-35373][BUILD] Check Maven artifact checksum in build/mvn #32505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

srowen
Copy link
Member

@srowen srowen commented May 11, 2021

What changes were proposed in this pull request?

./build/mvn now downloads the .sha512 checksum of Maven artifacts it downloads, and checks the checksum after download.

Why are the changes needed?

This ensures the integrity of the Maven artifact during a user's build, which may come from several non-ASF mirrors.

Does this PR introduce any user-facing change?

Should not affect anything about Spark per se, just the build.

How was this patch tested?

Manual testing wherein I forced Maven/Scala download, verified checksums are downloaded and checked, and verified it fails on error with a corrupted checksum.

@srowen srowen self-assigned this May 11, 2021
@github-actions github-actions bot added the BUILD label May 11, 2021
local remote_tarball="$1"
local local_tarball="${_DIR}/$2"
local binary="${_DIR}/$3"
local mirror_host="$1"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to break down the arguments further to make the logic work below. As part of that I tried to refactor this method a bit, which was getting hard to understand.

local binary="${_DIR}/$6"
local remote_tarball="${mirror_host}/${url_path}${url_query}"
local local_checksum="${local_tarball}.${checksum_suffix}"
local remote_checksum="https://archive.apache.org/dist/${url_path}.${checksum_suffix}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more point : the checksums are not mirrored (well, not consistently). But we very much want to check only the ASF's version of the checksum. If an attacker modified a mirror's artifact, they could just modify the checksum too of course.

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42907/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42907/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

In case of the mentioned out-of-sync cases between Apache main and mirrors, we will hit some random Maven jobs failures after this PR. It's rare and hopefully it will be recovered automatically in the subsequent runs eventually.

@srowen
Copy link
Member Author

srowen commented May 11, 2021

That much I'm not worried about, as this only concerns Maven distributions, and any version we used would be long since published and mirrored, nothing released days ago. (Yes we already face the problem of offline mirrors, but fallback to archive.apache.org anyway)

@SparkQA
Copy link

SparkQA commented May 11, 2021

Test build #138384 has finished for PR 32505 at commit b0db41e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented May 12, 2021

I may merge this to master first to see if there are issues, then back-port it if it seems OK. We probably do want this as part of maintenance branches.

@dongjoon-hyun
Copy link
Member

+1 for your decision, @srowen . Ya. Let's merge to master first and monitor it.

@srowen srowen closed this in 6c5fcac May 13, 2021
@srowen
Copy link
Member Author

srowen commented May 13, 2021

Merged to master. I'll back port in a while if there are no issues.

srowen added a commit that referenced this pull request May 19, 2021
### What changes were proposed in this pull request?

`./build/mvn` now downloads the .sha512 checksum of Maven artifacts it downloads, and checks the checksum after download.

### Why are the changes needed?

This ensures the integrity of the Maven artifact during a user's build, which may come from several non-ASF mirrors.

### Does this PR introduce _any_ user-facing change?

Should not affect anything about Spark per se, just the build.

### How was this patch tested?

Manual testing wherein I forced Maven/Scala download, verified checksums are downloaded and checked, and verified it fails on error with a corrupted checksum.

Closes #32505 from srowen/SPARK-35373.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
srowen added a commit that referenced this pull request May 19, 2021
### What changes were proposed in this pull request?

`./build/mvn` now downloads the .sha512 checksum of Maven artifacts it downloads, and checks the checksum after download.

### Why are the changes needed?

This ensures the integrity of the Maven artifact during a user's build, which may come from several non-ASF mirrors.

### Does this PR introduce _any_ user-facing change?

Should not affect anything about Spark per se, just the build.

### How was this patch tested?

Manual testing wherein I forced Maven/Scala download, verified checksums are downloaded and checked, and verified it fails on error with a corrupted checksum.

Closes #32505 from srowen/SPARK-35373.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
@srowen
Copy link
Member Author

srowen commented May 19, 2021

Backported to 3.1 / 3.0. There were merge conflicts in 2.4 ; maybe not worth backporting there as it's EOL

@dongjoon-hyun
Copy link
Member

Oh, @srowen .
This break branch-3.1.

exec: curl --silent --show-error -L https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz/zinc-0.3.15.tgzzinc-0.3.15/bin/zinc
./build/mvn: line 56: /home/runner/work/spark/spark/build/: Is a directory
exec: wget --no-verbose https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz/zinc-0.3.15.tgzzinc-0.3.15/bin/zinc
/home/runner/work/spark/spark/build/: Is a directory
ERROR: Cannot download https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz/zinc-0.3.15.tgzzinc-0.3.15/bin/zinc with cURL or wget; please install manually and try again.
Error: Process completed with exit code 2.

@dongjoon-hyun
Copy link
Member

It seems that the root cause is Zinc is removed at master branch while it exists in old branches.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun added a commit that referenced this pull request May 19, 2021
### What changes were proposed in this pull request?

This PR is a follow-up of #32505 to fix `zinc` installation.

### Why are the changes needed?

Currently, branch-3.1/3.0 is broken.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action.

Closes #32591 from dongjoon-hyun/SPARK-35373.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun added a commit that referenced this pull request May 19, 2021
### What changes were proposed in this pull request?

This PR is a follow-up of #32505 to fix `zinc` installation.

### Why are the changes needed?

Currently, branch-3.1/3.0 is broken.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action.

Closes #32591 from dongjoon-hyun/SPARK-35373.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 50edf12)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
rm -rf "$local_tarball"
fi
# Checksum may not have been specified; don't check if doesn't exist
if [ -f "${local_checksum}" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to add $(command -v shasum) here to check if shasum exists or not (then, raise an error with an explicit message)? For example, Amazon Linux2 does not have it by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh really? shoot. That seemed to be the one command that was on Linux and OS X. Well, we can fail outright if not present, which would stop the build. Is that better or worse than just silently letting it proceed without verification? I don't know. Would developers generally build on this distro?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh really? shoot. That seemed to be the one command that was on Linux and OS X.

Yea, for Amazon Linux2, we need to run sudo yum install perl-Digest-SHA before running build/mvn.

Is that better or worse than just silently letting it proceed without verification?

IMO, since this check exists to prevent cyber attacks, it is safer to makebuild/mvn fail if shasum does not exists (actually, it seems the other OSs have shasum by default, so most developers don't care, I think).

Would developers generally build on this distro?

Not sure, but that's the default option in aws ec2. I noticed this issue because my daily build broke after this commit merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I can submit a follow-up to fail if shasum is not present. That might inconvenience some developers, but, this is a security issue, and of course developers are welcome to install mvn / scala directly if desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

srowen pushed a commit that referenced this pull request May 20, 2021
## What changes were proposed in this pull request?
Use ` > /dev/null` to replace `-q` in shasum validation.

### Why are the changes needed?
In PR #32505 , added the shasum check on maven. The `shasum -a 512 -q -c xxx.sha` is used to validate checksum, the `-q` args is for "don't print OK for each successfully verified file", but `-q` arg is introduce in shasum 6.x version.

So we got the `Unknown option: q`.

```
➜  ~ uname -a
Darwin MacBook.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Apr 12 20:57:45 PDT 2021; root:xnu-6153.141.28.1~1/RELEASE_X86_64 x86_64
➜  ~ shasum -v
5.84
➜  ~ shasum -q
Unknown option: q
Type shasum -h for help
```

it makes ARM CI failed:
[1] https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
`shasum -a 512 -c wrong.sha  > /dev/null` return code 1 without print
`shasum -a 512 -c right.sha  > /dev/null` return code 0 without print
e2e test:
```
rm -f build/apache-maven-3.6.3-bin.tar.gz
rm -r build/apache-maven-3.6.3-bin
mvn -v
```

Closes #32604 from Yikun/patch-5.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
srowen pushed a commit that referenced this pull request May 20, 2021
## What changes were proposed in this pull request?
Use ` > /dev/null` to replace `-q` in shasum validation.

### Why are the changes needed?
In PR #32505 , added the shasum check on maven. The `shasum -a 512 -q -c xxx.sha` is used to validate checksum, the `-q` args is for "don't print OK for each successfully verified file", but `-q` arg is introduce in shasum 6.x version.

So we got the `Unknown option: q`.

```
➜  ~ uname -a
Darwin MacBook.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Apr 12 20:57:45 PDT 2021; root:xnu-6153.141.28.1~1/RELEASE_X86_64 x86_64
➜  ~ shasum -v
5.84
➜  ~ shasum -q
Unknown option: q
Type shasum -h for help
```

it makes ARM CI failed:
[1] https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
`shasum -a 512 -c wrong.sha  > /dev/null` return code 1 without print
`shasum -a 512 -c right.sha  > /dev/null` return code 0 without print
e2e test:
```
rm -f build/apache-maven-3.6.3-bin.tar.gz
rm -r build/apache-maven-3.6.3-bin
mvn -v
```

Closes #32604 from Yikun/patch-5.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit 38fbc0b)
Signed-off-by: Sean Owen <srowen@gmail.com>
srowen pushed a commit that referenced this pull request May 20, 2021
## What changes were proposed in this pull request?
Use ` > /dev/null` to replace `-q` in shasum validation.

### Why are the changes needed?
In PR #32505 , added the shasum check on maven. The `shasum -a 512 -q -c xxx.sha` is used to validate checksum, the `-q` args is for "don't print OK for each successfully verified file", but `-q` arg is introduce in shasum 6.x version.

So we got the `Unknown option: q`.

```
➜  ~ uname -a
Darwin MacBook.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Apr 12 20:57:45 PDT 2021; root:xnu-6153.141.28.1~1/RELEASE_X86_64 x86_64
➜  ~ shasum -v
5.84
➜  ~ shasum -q
Unknown option: q
Type shasum -h for help
```

it makes ARM CI failed:
[1] https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
`shasum -a 512 -c wrong.sha  > /dev/null` return code 1 without print
`shasum -a 512 -c right.sha  > /dev/null` return code 0 without print
e2e test:
```
rm -f build/apache-maven-3.6.3-bin.tar.gz
rm -r build/apache-maven-3.6.3-bin
mvn -v
```

Closes #32604 from Yikun/patch-5.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit 38fbc0b)
Signed-off-by: Sean Owen <srowen@gmail.com>
@srowen srowen deleted the SPARK-35373 branch June 1, 2021 13:48
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
### What changes were proposed in this pull request?

`./build/mvn` now downloads the .sha512 checksum of Maven artifacts it downloads, and checks the checksum after download.

### Why are the changes needed?

This ensures the integrity of the Maven artifact during a user's build, which may come from several non-ASF mirrors.

### Does this PR introduce _any_ user-facing change?

Should not affect anything about Spark per se, just the build.

### How was this patch tested?

Manual testing wherein I forced Maven/Scala download, verified checksums are downloaded and checked, and verified it fails on error with a corrupted checksum.

Closes apache#32505 from srowen/SPARK-35373.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
### What changes were proposed in this pull request?

This PR is a follow-up of apache#32505 to fix `zinc` installation.

### Why are the changes needed?

Currently, branch-3.1/3.0 is broken.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action.

Closes apache#32591 from dongjoon-hyun/SPARK-35373.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
### What changes were proposed in this pull request?

`./build/mvn` now downloads the .sha512 checksum of Maven artifacts it downloads, and checks the checksum after download.

### Why are the changes needed?

This ensures the integrity of the Maven artifact during a user's build, which may come from several non-ASF mirrors.

### Does this PR introduce _any_ user-facing change?

Should not affect anything about Spark per se, just the build.

### How was this patch tested?

Manual testing wherein I forced Maven/Scala download, verified checksums are downloaded and checked, and verified it fails on error with a corrupted checksum.

Closes apache#32505 from srowen/SPARK-35373.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
### What changes were proposed in this pull request?

This PR is a follow-up of apache#32505 to fix `zinc` installation.

### Why are the changes needed?

Currently, branch-3.1/3.0 is broken.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action.

Closes apache#32591 from dongjoon-hyun/SPARK-35373.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
## What changes were proposed in this pull request?
Use ` > /dev/null` to replace `-q` in shasum validation.

### Why are the changes needed?
In PR apache#32505 , added the shasum check on maven. The `shasum -a 512 -q -c xxx.sha` is used to validate checksum, the `-q` args is for "don't print OK for each successfully verified file", but `-q` arg is introduce in shasum 6.x version.

So we got the `Unknown option: q`.

```
➜  ~ uname -a
Darwin MacBook.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Apr 12 20:57:45 PDT 2021; root:xnu-6153.141.28.1~1/RELEASE_X86_64 x86_64
➜  ~ shasum -v
5.84
➜  ~ shasum -q
Unknown option: q
Type shasum -h for help
```

it makes ARM CI failed:
[1] https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
`shasum -a 512 -c wrong.sha  > /dev/null` return code 1 without print
`shasum -a 512 -c right.sha  > /dev/null` return code 0 without print
e2e test:
```
rm -f build/apache-maven-3.6.3-bin.tar.gz
rm -r build/apache-maven-3.6.3-bin
mvn -v
```

Closes apache#32604 from Yikun/patch-5.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit 38fbc0b)
Signed-off-by: Sean Owen <srowen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants