Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18145.Decompress the ZIP file and retain the original file per… #4036

Merged
merged 18 commits into from
Mar 30, 2022

Conversation

smallzhongfeng
Copy link
Contributor

@smallzhongfeng smallzhongfeng commented Feb 27, 2022

Description of PR

When we use the unzip method of Hadoop common to decompress, the UNIX permission cannot be reserved, so we use the ziparchiveentry of Apache common to preserve the permission information. When the file has UNIX permission, the unzip decompression method will preserve the permission first, and then set the original permission information to the new file.

How was this patch tested?

Unit tests were conducted.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems a reasonable idea, just some details about production code and tests to resolve

@smallzhongfeng
Copy link
Contributor Author

seems a reasonable idea, just some details about production code and tests to resolve

Thank you for your review.@steveloughran

@apache apache deleted a comment from hadoop-yetus Mar 9, 2022
@apache apache deleted a comment from hadoop-yetus Mar 9, 2022
@apache apache deleted a comment from hadoop-yetus Mar 9, 2022
@apache apache deleted a comment from hadoop-yetus Mar 9, 2022
@apache apache deleted a comment from hadoop-yetus Mar 9, 2022
@apache apache deleted a comment from hadoop-yetus Mar 9, 2022
@apache apache deleted a comment from hadoop-yetus Mar 9, 2022
@steveloughran
Copy link
Contributor

down to the last through checkstyles -this is good.

./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java:694:  /** Assign the original permissions to the file */: First sentence should end with a period. [JavadocStyle]
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java:751:                Files.setPosixFilePermissions(file.toPath(), permissionsFromMode(entry.getUnixMode()));: 'if' child has incorrect indentation level 16, expected level should be 14. [Indentation]
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java:751:                Files.setPosixFilePermissions(file.toPath(), permissionsFromMode(entry.getUnixMode()));: Line is longer than 100 characters (found 103). [LineLength]

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nearly there, just fix the checkstyles and add assertion error messages on the checks for permisson

@smallzhongfeng
Copy link
Contributor Author

smallzhongfeng commented Mar 11, 2022

Thank you for your review again.@steveloughran

nearly there, just fix the checkstyles and add assertion error messages on the checks for permisson

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, no changes to production code, just test tuning.

I am worried about security here because unzip (especially java commons) has had cves issued in the past, especially about expanding files into absolute paths. is anything dangerous being introduced here, e.g. can files retain root ownership and retain the suid flag, or is this only about rwx permissions? i think it is, just want everyone to be rigorous here.

@apache apache deleted a comment from hadoop-yetus Mar 22, 2022
@smallzhongfeng
Copy link
Contributor Author

smallzhongfeng commented Mar 24, 2022

Yes, if I set the suid flag, the file will be executed with the same permissions as the owner of the executable file. I think it's dangerous. So, in the code, any file with a suid flag will end up with only executable permission +x, this is only about rwx permissions.

I am worried about security here because unzip (especially java commons) has had cves issued in the past, especially about expanding files into absolute paths. is anything dangerous being introduced here, e.g. can files retain root ownership and retain the suid flag, or is this only about rwx permissions? i think it is, just want everyone to be rigorous here.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying the Suid issue. Yes, that would be bad.

Rather than trying to build enumeration values from string concatenation, how about we have the addPermissions method take the specific permissions to be assigned.

  static void addPermissions(
      Set<PosixFilePermission> permissions,
      int mode.
      PosixFilePermission r,
      PosixFilePermission w,
      PosixFilePermission x) {
    if ((mode & 1L) == 1L) {
      permissions.add(x);
    }
    if ((mode & 2L) == 2L) {
      permissions.add(w);
    }
    if ((mode & 4L) == 4L) {
      permissions.add(r);
    }
  }

then this is invoked for the three different groups, with different permissions passed in.

addPermissions(permissions, mode, OTHERS_READ, OTHERS_READ, OTHERS_EXECUTE)

Yes, it is more verbose. But it should be more efficient and the compiler can do more type safety checks.

This can all be tested too. How about some unit tests which take some mode strings and convert to permissions. This is easier to do once addPermissions has been changed, as the tests needed for full coverage of it are just four possible values of mode: 0, 1, 2, 4 and an assert to verify that only the expected permission is done.

then the permissionsFromMode() method can be tested passing in permissions for users group and other only, e.g.

  • 0700, 0070 and 0007 maps the full set for that mode.
  • 0111 maps to execute for all
  • 0222 is write for all
  • 0444 is read for all

that should verify that the mapping is good

at which point we can be fairly happy that things are good (I am not worry)

(i am sorry i am adding so much homework to a small patch, but this is a key piece of code which gets used a lot and so i want to make sure i don't get support calls related to the change...)

@smallzhongfeng
Copy link
Contributor Author

smallzhongfeng commented Mar 28, 2022

Thank you for your review. @steveloughran
1.the design of this method is more redundant, but more clear, and I will improve it.
2.the test case, I try to use UserPrincipalLookupService class to update the current user groups and others, to verify the current file permissions for user groups and others as a result, but for ordinary users, is impossible to setOwner operation, this leads to the current user can't change, to this, Do you have any good suggestions?

then the permissionsFromMode() method can be tested passing in permissions for users group and other only, e.g.

@apache apache deleted a comment from hadoop-yetus Mar 29, 2022
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good, one little detail in the test.

+1 pending that change

@steveloughran steveloughran merged commit 08e6d0c into apache:trunk Mar 30, 2022
asfgit pushed a commit that referenced this pull request Mar 30, 2022
…heir original permissions (#4036)

Contributed by jingxiong zhong

Change-Id: Ie252e609798719dc658364f9bae48b34dc72a79c
@steveloughran
Copy link
Contributor

+1.

merged to trunk and cherrypicked into branch-3.3.

thanks!

sorry for being so rigorous here but we have to worry about security in the code which expands and then executes artifacts; unzip is a source of CVEs across many applications

@smallzhongfeng
Copy link
Contributor Author

Thanks for your guidance.

It is necessary to be strict with the code. I look forward to cooperating with you next time.

@apache apache deleted a comment from hadoop-yetus Mar 30, 2022
@apache apache deleted a comment from hadoop-yetus Mar 30, 2022
@apache apache deleted a comment from hadoop-yetus Mar 30, 2022
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
…heir original permissions (apache#4036)


Contributed by jingxiong zhong
HyukjinKwon pushed a commit to apache/spark that referenced this pull request Mar 28, 2023
### What changes were proposed in this pull request?

Just remove comment.

### Why are the changes needed?

After apache/hadoop#4036, unzip could unzip could keep file permissions.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No need, has been added in hadoop-client side.

Closes #40572 from smallzhongfeng/SPARK-37677.

Authored-by: jokercurry <982458633@qq.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants