Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38631][CORE] Uses Java-based implementation for un-tarring at Utils.unpack #35946

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Mar 23, 2022

What changes were proposed in this pull request?

This PR proposes to use FileUtil.unTarUsingJava that is a Java implementation for un-tar .tar files. unTarUsingJava is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

Why are the changes needed?

There is a risk for arbitrary shell command injection via Utils.unpack when the filename is controlled by a malicious user. This is due to an issue in Hadoop's unTar, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

Does this PR introduce any user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

How was this patch tested?

Manually tested in local, and existing test cases should cover.

@HyukjinKwon HyukjinKwon force-pushed the SPARK-38631 branch 2 times, most recently from 8cb63a9 to 73a0252 Compare March 23, 2022 04:24
@HyukjinKwon
Copy link
Member Author

This is also being reported via a different channel to Hadoop.

@HyukjinKwon HyukjinKwon marked this pull request as draft March 23, 2022 04:29
@HyukjinKwon HyukjinKwon force-pushed the SPARK-38631 branch 2 times, most recently from be0820b to b994691 Compare March 23, 2022 04:54
FileUtil.unTar(source, dest)
} else if (lowerSrc.endsWith(".tar")) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tar alone has the problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, after this PR, it will lose the file permissions from tar. Keeping file permissions is a non-trivial work (see SPARK-37677). I will follow how Hadoop handles this, and follow it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I added lines 596-598 to mitigate some of this. Can we update Hadoop to fix this too? I suppose inlining is a bit more code but more robust in this regard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. cc @steveloughran FYI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw apache/hadoop#4036 is adding perms wiring for unzip...I'm being very cautious in reviewing there

@HyukjinKwon HyukjinKwon marked this pull request as ready for review March 23, 2022 04:56
@github-actions github-actions bot added the CORE label Mar 23, 2022
@HyukjinKwon
Copy link
Member Author

cc @dongjoon-hyun mind taking a look please when you find some time?

@HyukjinKwon
Copy link
Member Author

Thanks @martin-g

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM + 1

FileUtil.unTar(source, dest)
} else if (lowerSrc.endsWith(".tar")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I added lines 596-598 to mitigate some of this. Can we update Hadoop to fix this too? I suppose inlining is a bit more code but more robust in this regard.

core/src/main/scala/org/apache/spark/util/Utils.scala Outdated Show resolved Hide resolved
@dongjoon-hyun
Copy link
Member

Sorry for being late, @HyukjinKwon .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@HyukjinKwon
Copy link
Member Author

Merged to master, branch-3.3, branch-3.2 and branch-3.1.

HyukjinKwon added a commit that referenced this pull request Mar 24, 2022
…Utils.unpack

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes #35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 057c051)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon added a commit that referenced this pull request Mar 24, 2022
…Utils.unpack

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes #35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 057c051)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon added a commit that referenced this pull request Mar 24, 2022
…Utils.unpack

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes #35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 057c051)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
7mming7 pushed a commit to 7mming7/spark that referenced this pull request Mar 28, 2022
…Utils.unpack

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes apache#35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…Utils.unpack

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes apache#35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 057c051)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 271b338)
Mrhs121 pushed a commit to Mrhs121/spark that referenced this pull request Jul 3, 2023
…Utils.unpack

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes apache#35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 057c051)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Mrhs121 pushed a commit to Mrhs121/spark that referenced this pull request Jul 13, 2023
…Utils.unpack

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes apache#35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 057c051)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Mrhs121 added a commit to Kyligence/spark that referenced this pull request Jul 18, 2023
…Utils.unpack (#643)

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes apache#35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>

(cherry picked from commit 057c051)

Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Mrhs121 pushed a commit to Mrhs121/spark that referenced this pull request Jul 21, 2023
…Utils.unpack

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes apache#35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 057c051)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Mrhs121 added a commit to Kyligence/spark that referenced this pull request Jul 21, 2023
…Utils.unpack (#645)

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes apache#35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>

(cherry picked from commit 057c051)

Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Mrhs121 added a commit to Mrhs121/spark that referenced this pull request Jul 27, 2023
…Utils.unpack (Kyligence#643)

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes apache#35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>

(cherry picked from commit 057c051)

Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Mrhs121 added a commit to Kyligence/spark that referenced this pull request Jul 27, 2023
…Utils.unpack (#643) (#648)

### What changes were proposed in this pull request?

This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264.

The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter).

### Why are the changes needed?

There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904

### Does this PR introduce _any_ user-facing change?

Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command.

### How was this patch tested?

Manually tested in local, and existing test cases should cover.

Closes apache#35946 from HyukjinKwon/SPARK-38631.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>

(cherry picked from commit 057c051)

Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
@HyukjinKwon HyukjinKwon deleted the SPARK-38631 branch January 15, 2024 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants