Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17851. Support user specified content encoding for S3A #3498

Merged

Conversation

steveloughran
Copy link
Contributor

#3312 by @holdenk rebased & with some final wrap-up tweaks.

  • "fs.s3a.object.content-encoding" declares the content encoding
  • it's not set for directories.
  • it's documented this is a verification code text messages for some online account

I changed the key name to line up for adding the rest of the standard headers as/when we choose; fs.object.* would cover the set.

Tested, s3 london, no s3guard. One test failure, the usual intermittent ITestPartialRenamesDeletes.testRenameDirFailsInDelete

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • [ X] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

holdenk and others added 12 commits September 28, 2021 19:03
…cts they are putting. This is useful for people loading the data into other tools in the AWS ecosystem which don't use file extensions to infer compression type (e.g. serving compressed files from S3 or importing into RDS)
…etup/skip is now different

Change-Id: I0bec0926625667ca1c8be8fa11e8ff1a11e0dbe8
* change option to fs.s3a.object.content.encoding to line up for
  support of the other values
* docs
* dir markers do not have an encoding set
* tests to verify that

Change-Id: I4b042d38c9e94e7d4ffa8ee6afdc7a6c26a4c489
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 34m 5s trunk passed
+1 💚 compile 0m 44s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 0m 27s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 javadoc 0m 21s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 31s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 9s trunk passed
+1 💚 shadedclient 22m 12s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 34s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 0m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 javadoc 0m 14s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 23s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 10s the patch passed
+1 💚 shadedclient 21m 57s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 47s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 30s The patch does not generate ASF License warnings.
92m 3s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3498/1/artifact/out/Dockerfile
GITHUB PR #3498
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint
uname Linux c4c99f55b520 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f457367
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3498/1/testReport/
Max. process+thread count 518 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3498/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor Author

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.
I'm happy with the final changes :)

@steveloughran steveloughran merged commit 2fda61f into apache:trunk Sep 29, 2021
@steveloughran
Copy link
Contributor Author

merged into trunk. To get into 3.3 we will need the CSE backport, which is blocked on #3466, and probably also needing #3260 just for completeness & to maintain a consistent ordering

@holdenk
Copy link

holdenk commented Sep 29, 2021

Yay, thanks @steveloughran and everyone for their review time and help I appreciate it :)

asfgit pushed a commit that referenced this pull request Mar 23, 2022
The option fs.s3a.object.content.encoding declares the content encoding to be set on files when they are written; this is served up in the "Content-Encoding" HTTP header when reading objects back in.

This is useful for people loading the data into other tools in the AWS ecosystem which don't use file extensions to infer compression type (e.g. serving compressed files from S3 or importing into RDS)

Contributed by: Holden Karau

Change-Id: Ice0da75b516370f51f79e45f391d46c5c7aa4ce4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants