-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-18820. Cut AWS v1 support #5872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18820. Cut AWS v1 support #5872
Conversation
* downgrade v1 sdk to "provided" * only allowed in one package in production code * use isolated classes and reflection games to allow rest of module to work without it when instantiating credential providers. * tests updated as appropriate * remapping of standard aws v1 credential providers to v2 classnames prior to instantiation. Change-Id: I0ab248e71e696b638420c09afc2b141420aaf596
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, +1. couple of minor comments (unused variables & javadocs)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/adapter/AwsV1BindingSupport.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/adapter/AwsV1BindingSupport.java
Outdated
Show resolved
Hide resolved
* lots of improvements in the binding * autocloseable/closeable passdown as far as wrapped v1 * more diagnostics on provider classes, including useful toString() values * more tests * all token tests happy Change-Id: I4e6151ba6c35e280203dc6a705b6ed1fba7de551
* split v2 and v1 credential test suites * rename AwsCredentialListProvider to CredentialProviderListFactory as it was too close to AWSCredentialProviderList. * yetus feedback * review feedback * more use of InstantiationException over more generic IOEs. Side issue: wondering if we should cut all the S3xLoginHelper stuff? been a long time since s3n existed and we've had to tell people to stop it. Change-Id: I405dc1793b39d9ee0a2f47c0600c2a3040154d75
IDE had warned of use of non-final resolveCredentials() call in superclass, so cut it ...which broke ITestAssumeRole badly ...so reverted, but marked class final +InstantiationIOException is now a subclass of PathIOE; takes URI of filesystem for more details. +fix ITestS3ABucketExistence by clearing endpoint and region bucket settings. Change-Id: I37edc23f76d7be95cd055dcd65d0fdb17e6992f2
current dependency setup
sdk core stuff is being involved, which isn't needed at all. I'm pulling in the changes of #5739 to align dependencies, then make sure more are stripped. This gives
@ahmarsuhail what do we do about the crt library? is it in bundle.jar? is it something to tag as provided and document? |
…libraries * explicit jackson declaration in hadoop-aws * cut jackson, eventstream and ion from aws SDK bundle dependencies * sdk-core excludes everything, as only the core interfaces/classes are needed to compile against * aws-crt is test scope. not sure about that. Change-Id: I47f6e10d42c8067df8255eca69799469d7252480
hi, needs a bit more review/upvote. with this patch, we still support v1 providers if people set up the classpath, but we map the standard ones to v2 equivalents. |
* HADOOP-18812 list aws SDK v2 libraries in LICENSE-binary * all imports of aws-java-sdk-core excluded * javadoc and some other checkstyle issues reported in builds addressed (HADOOP-18819) Change-Id: I8043b157f18c325fa9599bdc4272f655307c0b6a
the current pom changes means that s3 select operations fail
given that the feature has never worked properly through MR/spark jobs and nobody has ever filed a bug report against the feature, I'm not sure it gets used at all. we should be ruthless here; cut it? |
Eventstream 1.0.1 JAR is used in tests but not in runtime; this is documented. * S3 Select documents move feature from "experimental" to "deprecated" * detail on why it doesn't work on big files Change-Id: I5b34526c97736bba621e897d1cf5b966a6f3af8b
+ update troubleshooting to new package names Change-Id: Ifbba1bf7188518aff9f58e911534ff600b958ad6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big patch. Reviewed the prod code. Looks good to me overall. Will check test code later
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_upgrade.md
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md
Outdated
Show resolved
Hide resolved
Change-Id: Ie4c14a041595e17cc0e64ff65c2a8db6b60c2316
Change-Id: I0e4a27ad7b32667c97d914fc8bd6ee52e755bf4c
💔 -1 overall
This message was automatically generated. |
Change-Id: I042b5036c4ad0505ea3c1d7a5f96ad35171104a2
created https://issues.apache.org/jira/browse/YARN-11546 for the TestTimelineAuthFilterForV2 failure |
* use intercept() and InstantiationIOException where appropriate. * change method names to distinguish them better. * add test to verify that the static create() method is called. Change-Id: I88e6c4eed6153861b6bd83300e6c484eb4dda08e
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Spotbugs is running off a version of the findbugs.xml which is not the one in the last commit. not sure how/why, but locally a call to All test failures are unrelated. I don't know what is up with the spotbugs run, but this patch is ready except for style/eof, which i will now address |
Change-Id: If7ca8bb9c89794e079e0de4c9de46da8ac565032
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Couple of minor comments
@@ -998,7 +999,7 @@ private void bindAWSClient(URI name, boolean dtEnabled) throws IOException { | |||
* @param parameters parameter object | |||
* @throws IOException on any IO problem | |||
*/ | |||
private synchronized void createS3AsyncClient(S3ClientFactory clientFactory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we moved these out into their separate methods to fix spotbugs issues..cause they were not getting fixed by the addition to findbugs-exclude.xml. but if that's working now, we can revert these changes and intialise the s3Async client the same way as the sync one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know WTF is up with spotbugs. it works on the command line for me.
/** | ||
* Warns on an AWS V1 credential provider being referenced directly. | ||
* Notes an AWS V1 credential provider being referenced directly. | ||
* @param name name of the credential provider | ||
*/ | ||
public static void v1ProviderReferenced(String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently this isn't used anywhere. CredentialProviderListFactory already has LOG_REMAPPED_ENTRY which will log if you're using a V1 provider I think. if we want the debug..we can call this from CredentialProviderListFactory.buildAWSProviderList
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll look at this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cut it.
💔 -1 overall
This message was automatically generated. |
Cut unused v1ProviderReferenced() method audit doc update - auditing is enabled again - new fs.s3a.audit.execution.interceptors option - how to log httpclient Change-Id: Ib0c5b86407e059d53c469151cf3d89ede65e9116
latest patch removed the redundant method and updates the auditing docs to cover the migration to the new option, plus any other updates needed. Note: storediag updated to print the new audit option and those files on the classpath to list what to automatically load too. @ahmarsuhail if people really wanted to plug in handling here, maybe we should say "if the class implements Configurable we call setConf on it". that way people can add dynamic configuration through hadoop/spark jobs. |
v1 conf load uses getTrimmed() and logs the list of values; v2 ExecutionInterceptor will, if they implement Configurable, get the auditor Configuration Change-Id: Ic0d49cbceb4a0c9fed9013c9a605f9b240801d73
OK, I added the setConf() feature. Means thing is potentially useful. V1 handlers get an error, as seen in the test
Note, AFAIK nobody has ever added a custom handler. |
tested s3 london at scale; got the usual huge file multipart failure. turn off for now? also saw a transitive failure of ITestS3APrefetchingInputStream; seen that before. unrelated and fixed in trunk by #5149 |
@mukund-thakur @mehakmeet you happy with this? we are, and once it's in I'm ready to rebase and merge to trunk |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Change-Id: Ib6125d34fa51a4af26f679339598241a6fb42fc8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1 , pending yetus.
nice unit test TestV1CredentialsProvider
💔 -1 overall
This message was automatically generated. |
This removes the AWS V1 SDK as a hadoop-aws runtime dependency. It is still used at compile time so as to build a wrapper class V1ToV2AwsCredentialProviderAdapter which allows v1 credential provider to be used for authentication. All well known credential providers have their classname remapped from v1 to v2 classes prior to instantiation; this wrapper is not needed for them. There is no support for migrating other SDK plugin points (signing, handlers) Access to the v2 S3Client class used by an S3A FileSystem instance is now via a new interface org.apache.hadoop.fs.s3a.S3AInternals; other low-level operations (getObjectMetadata(Path)) have moved. Contributed by Steve Loughran Change-Id: I231fade61b20a206a018d18f5e867f5d3a1fa879
This removes the AWS V1 SDK as a hadoop-aws runtime dependency. It is still used at compile time so as to build a wrapper class V1ToV2AwsCredentialProviderAdapter which allows v1 credential provider to be used for authentication. All well known credential providers have their classname remapped from v1 to v2 classes prior to instantiation; this wrapper is not needed for them. There is no support for migrating other SDK plugin points (signing, handlers) Access to the v2 S3Client class used by an S3A FileSystem instance is now via a new interface org.apache.hadoop.fs.s3a.S3AInternals; other low-level operations (getObjectMetadata(Path)) have moved. Contributed by Steve Loughran
This removes the AWS V1 SDK as a hadoop-aws runtime dependency. It is still used at compile time so as to build a wrapper class V1ToV2AwsCredentialProviderAdapter which allows v1 credential provider to be used for authentication. All well known credential providers have their classname remapped from v1 to v2 classes prior to instantiation; this wrapper is not needed for them. There is no support for migrating other SDK plugin points (signing, handlers) Access to the v2 S3Client class used by an S3A FileSystem instance is now via a new interface org.apache.hadoop.fs.s3a.S3AInternals; other low-level operations (getObjectMetadata(Path)) have moved. Contributed by Steve Loughran
This removes the AWS V1 SDK as a hadoop-aws runtime dependency. It is still used at compile time so as to build a wrapper class V1ToV2AwsCredentialProviderAdapter which allows v1 credential provider to be used for authentication. All well known credential providers have their classname remapped from v1 to v2 classes prior to instantiation; this wrapper is not needed for them. There is no support for migrating other SDK plugin points (signing, handlers) Access to the v2 S3Client class used by an S3A FileSystem instance is now via a new interface org.apache.hadoop.fs.s3a.S3AInternals; other low-level operations (getObjectMetadata(Path)) have moved. Contributed by Steve Loughran
This removes the AWS V1 SDK as a hadoop-aws runtime dependency. It is still used at compile time so as to build a wrapper class V1ToV2AwsCredentialProviderAdapter which allows v1 credential provider to be used for authentication. All well known credential providers have their classname remapped from v1 to v2 classes prior to instantiation; this wrapper is not needed for them. There is no support for migrating other SDK plugin points (signing, handlers) Access to the v2 S3Client class used by an S3A FileSystem instance is now via a new interface org.apache.hadoop.fs.s3a.S3AInternals; other low-level operations (getObjectMetadata(Path)) have moved. Contributed by Steve Loughran This patch (unlike the feature branch) does not have an enforcer rule to reject use of com.amazonaws classes; yetus was rejecting it. Change-Id: I854fc5d8707016a87a1321e46c1f0ad726ceff3a
This is an aggregate patch of the changes from feature-HADOOP-18073-s3a-sdk-upgrade and moves the S3A connector to to using the V2 AWS SDK This is a major change: See aws_sdk_v2_changelog.md for details. A new shaded v2 SDK JAR "bundle.jar" needs to be distributed with the connector to interact with S3 stores All code which was using the V1 SDK classes with the S3AFileSystem will need upgrading. Contributed by Ahmar Suhail HADOOP-18820. Cut AWS v1 support (apache#5872) This removes the AWS V1 SDK as a hadoop-aws runtime dependency. It is still used at compile time so as to build a wrapper class V1ToV2AwsCredentialProviderAdapter which allows v1 credential provider to be used for authentication. All well known credential providers have their classname remapped from v1 to v2 classes prior to instantiation; this wrapper is not needed for them. There is no support for migrating other SDK plugin points (signing, handlers) Access to the v2 S3Client class used by an S3A FileSystem instance is now via a new interface org.apache.hadoop.fs.s3a.S3AInternals; other low-level operations (getObjectMetadata(Path)) have moved. Contributed by Steve Loughran HADOOP-18853. Upgrade AWS SDK version to 2.20.28 (apache#5960) Upgrades the AWS sdk v2 version to 2.20.28 This * adds multipart COPY/rename in the java async client * removes the aws-crt JAR dependency Contributed by Ahmar Suhail HADOOP-18818. Merge aws v2 upgrade feature branch into trunk Contains HADOOP-18863. AWS SDK V2 - AuditFailureExceptions aren't being translated properly Change-Id: I96b26cc1ee535c519248ca6541fb157017dcc7e4
This is an aggregate patch of the changes from feature-HADOOP-18073-s3a-sdk-upgrade and moves the S3A connector to to using the V2 AWS SDK This is a major change: See aws_sdk_v2_changelog.md for details. A new shaded v2 SDK JAR "bundle.jar" needs to be distributed with the connector to interact with S3 stores All code which was using the V1 SDK classes with the S3AFileSystem will need upgrading. Contributed by Ahmar Suhail HADOOP-18820. Cut AWS v1 support (apache#5872) This removes the AWS V1 SDK as a hadoop-aws runtime dependency. It is still used at compile time so as to build a wrapper class V1ToV2AwsCredentialProviderAdapter which allows v1 credential provider to be used for authentication. All well known credential providers have their classname remapped from v1 to v2 classes prior to instantiation; this wrapper is not needed for them. There is no support for migrating other SDK plugin points (signing, handlers) Access to the v2 S3Client class used by an S3A FileSystem instance is now via a new interface org.apache.hadoop.fs.s3a.S3AInternals; other low-level operations (getObjectMetadata(Path)) have moved. Contributed by Steve Loughran HADOOP-18853. Upgrade AWS SDK version to 2.20.28 (apache#5960) Upgrades the AWS sdk v2 version to 2.20.28 This * adds multipart COPY/rename in the java async client * removes the aws-crt JAR dependency Contributed by Ahmar Suhail HADOOP-18818. Merge aws v2 upgrade feature branch into trunk Contains HADOOP-18863. AWS SDK V2 - AuditFailureExceptions aren't being translated properly Change-Id: I96b26cc1ee535c519248ca6541fb157017dcc7e4
Part of HADOOP-18073 Upgrade AWS SDK to v2
How was this patch tested?
Integration tests in progress
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?