-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-18908. Improve S3A region handling. #6106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18908. Improve S3A region handling. #6106
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Noticed a couple of issues with testing: For some reason getBucketLocation() fails for public buckets (landsat-pds), and also setting US_EAST_1 as the default region doesn't work, so had to use US_EAST_2. Looking into it with SDK team. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented.
- its critical that any fs.s3a.endpoint.region setting takes priority over everytihing else.
- and we should see if vpce resolution works this time.
i think i'd like the region to be set via the client builder, just for cleanliness and for ease of other implementations (I have just implemened one for hboss!)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
Show resolved
Hide resolved
|
||
if(!endpoint.endsWith(CENTRAL_ENDPOINT)) { | ||
LOG.debug("Endpoint {} is not the default; parsing", endpoint); | ||
return AwsHostNameUtils.parseSigningRegion(endpoint, S3_SERVICE_NAME).orElse(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weve had so much trouble in the past here with vpce, govcloud, cn that we need a set of unit tests to verify this code does what we actually want -and therea are no regressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out this AwsHostNameUtils.parseSigningRegion
is not meant to be used for non standard endpoints .. which why a vpce endpoint gets resolved to region "vpce". Can we still merge this PR and create a follow up PR to handle non standard endpoints? I working with the SDK team to understand how best to do this - via the SDK or in S3A.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. getting pressure to merge this as without it then its a major regression. also, we need a "what is a region" section, or maybe "all about signing, endpoints and regions"
* If endpoint is the central one, use US_EAST_1. | ||
* | ||
* @param endpoint the configure endpoint. | ||
* @return the S3 region. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc to say returns null if not resolvved
} else { | ||
// no region. | ||
// allow this if people really want it; it is OK to rely on this | ||
// when deployed in EC2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is ok in ec2, then this will be very noisy in ec2 deployments?
builder.region(region); | ||
} else { | ||
builder.crossRegionAccessEnabled(true); | ||
builder.region(getS3Region(conf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a region is set in the config, it must take priority over anything the AWS SDK determines from the endpoint. without that we can never correct for sdk issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, what about: env vars and sysprops? good or bad?
Updated as per review comments, and added tests ITestS3AEndpointRegion for china, govloud and VPCE. The VPCE test is failing, looking into it. Made some changes to core-site.xml as they were setting s3.amazonaws.com as the endpoint for public buckets, instead set the region of the public buckets. This changes behaviour slightly to what the behaviour was in SDK V1:
We could enable cross region even if a region is set here as well, which will mean that even if you've configured an incorrect region things will work. But I think failing is better in this case. Tested latest changes in |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
50de85b
to
2124c83
Compare
rebased and tested with fs.s3a.endpoint.region as Two failures [ERROR] ITestS3AEndpointRegion.testWithVPCE:215->lambda$testWithVPCE$7:215 [Incorrect region set] expected:<"[us-west-2]"> but was:<"[vpce]"> VPCE test fails because SDK is not parsing the vpce endpoint correctly (i think). |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; merging. do create that followup though
|
||
if(!endpoint.endsWith(CENTRAL_ENDPOINT)) { | ||
LOG.debug("Endpoint {} is not the default; parsing", endpoint); | ||
return AwsHostNameUtils.parseSigningRegion(endpoint, S3_SERVICE_NAME).orElse(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. getting pressure to merge this as without it then its a major regression. also, we need a "what is a region" section, or maybe "all about signing, endpoints and regions"
going to test this myself to see what happens when i don't set the region |
ok, so playing with it, the sole change i want to make is for that log of region setting to include the origin too. I'll make a pr with that change appended. |
🎊 +1 overall
This message was automatically generated. |
closing in favour of #6187 |
Description of PR
Removes the region probe to S3 and reinstates the region resolution logic that was there before the SDK V2 upgrade.
parse region from endpoint (or endpoint is null), look at the the region configuration.
How was this patch tested?
Testing in progress.