-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17771. S3AFS creation fails "Unable to find a region via the region provider chain." #3133
HADOOP-17771. S3AFS creation fails "Unable to find a region via the region provider chain." #3133
Conversation
Contributed by Steve Loughran. Change-Id: I94284178d27a48947e7c0942a7c8565379de7e9b
Testing in progress, in a setup where
|
|
|
Test run with rerunning with dynamo & scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great catch! Can't imagine the confusion when debugging issues like this.
Code looks good.
tested with transient read buffer underflow failure; one extra S3Guard write than expected. Both of those surface when there are too many records
|
One thought here: would you ever want the s3a connector to fall back to that bundled region lookup sequence? I'm wondering in particular if it makes a difference in routing/billing on EC2 deployments? As of Hadoop 3.3.1 if region=null, endpoint=null the Ec2 metadata is used to provide the region info (this is new). this could mean connections are slower to set up, risk of remote data transfer and billing (though the redirections should fix that, right?), and if the rules for a deployment prevent out-of-region network traffic, will this break. I think we have hit problems related to this in the past. put differently: is anything special happening with the default "null" endpoint and Ec2 metadata region name provision which we need to know about and support? If so, we could allow the region to be set to "" or maybe "ec2" and have that revert to the resolve chain |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more to do
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, with your comments.
Add * Ability to fall back to the region chain if you set fs.s3a.endpoint.region to "" * Test that this happens by setting the system property aws.region to a value and verifying it is picked up. * Details in troubleshooting.md, including a workaround for Hadoop-3.3.1+ This is going to surface in the wild for people doing remote IO; should add this in the Hadoop JIRA text too. Change-Id: I9a05d7b6ae9da98b44ceeff94582ffaed96980d3
* Fix checkstyle warnings. * Review, move and enhance troubleshooting. * Noticed a mention of the ~/.aws stuff in index.md; made clear it was per-host. Change-Id: If2c8be1b0a85144242f551253586f34abb7fa26d
Latest version does let you switch to the region resolution process if you really want to; this actually lets me do a test by setting sysprops to verify that the region is picked up that way. Also the SDK exceptions are being converted to IOEs. Tested s3 london, I just realised that I'd set the fs.s3a.endpoint property though; I'll have to rerun without any endpoint or region set for the test bucket. |
…egion provider chain." * Fix checkstyle warnings. * Log at warning once for default chain. * New stack trace in the docs. Change-Id: I64c210e576e9df4f42bc1083f2f50ccbab6b65b2
Latest patch warns user on fallback, with the LogExactlyOnce class to stop it being over-noisy if someone really, really wants to use this "feature". Also the latest stack trace is in, as well as the hadoop-3.3.1 one. I've also added the workaround info to the JIRA description as it'll probably be the first entry google will find for this |
Tests in progress, s3 london, endpoint and region unset, |
🎊 +1 overall
This message was automatically generated. |
…egion provider chain." * New stack trace in the docs. Change-Id: I4503fefc8b5af0f2033262c78839c309bf984a5a
…on auditing. (Not quite for this PR, but it integrates well) Change-Id: If113760e8324973c613db2743f3c9c8bbed9cc17
Change-Id: I151cd02c14525101c75fb033e48ab9711d13314a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
🎊 +1 overall
This message was automatically generated. |
Test run in progress. I have commented the region field in ~/.aws/config |
After taking a glance. Not debugged in detail. |
The STS stuff has always needed it; look in testing.md and elsewhere for references to the option. Maybe the SDK provided the fallback if the default endpoint was used. That is not a regression, and all the docs discuss it. I don't want to fix that here as its clearly not an issue in actual deployments; this patch can focus on the regression Set up your region for testing and it will go away
|
@mukund-thakur see HADOOP-16565 for the STS behaviour; no regressions there & that patch provides some diagnostics. |
@steveloughran Thanks. Tests run fine after setting the sts region and endpoints. So we are good. |
thanks, merging! |
merged into trunk; building and testing for 3.3 |
…egion provider chain." (#3133) This addresses the regression in Hadoop 3.3.1 where if no S3 endpoint is set in fs.s3a.endpoint, S3A filesystem creation may fail on non-EC2 deployments, depending on the local host environment setup. * If fs.s3a.endpoint is empty/null, and fs.s3a.endpoint.region is null, the region is set to "us-east-1". * If fs.s3a.endpoint.region is explicitly set to "" then the client falls back to the SDK region resolution chain; this works on EC2 * Details in troubleshooting.md, including a workaround for Hadoop-3.3.1+ * Also contains some minor restructuring of troubleshooting.md Contributed by Steve Loughran. Change-Id: Ife482cff513307cd52d59eec56beac0a33e031f5
…egion provider chain." (apache#3133) This addresses the regression in Hadoop 3.3.1 where if no S3 endpoint is set in fs.s3a.endpoint, S3A filesystem creation may fail on non-EC2 deployments, depending on the local host environment setup. * If fs.s3a.endpoint is empty/null, and fs.s3a.endpoint.region is null, the region is set to "us-east-1". * If fs.s3a.endpoint.region is explicitly set to "" then the client falls back to the SDK region resolution chain; this works on EC2 * Details in troubleshooting.md, including a workaround for Hadoop-3.3.1+ * Also contains some minor restructuring of troubleshooting.md Contributed by Steve Loughran.
…on via the region provider chain." (apache#3133) This addresses the regression in Hadoop 3.3.1 where if no S3 endpoint is set in fs.s3a.endpoint, S3A filesystem creation may fail on non-EC2 deployments, depending on the local host environment setup. * If fs.s3a.endpoint is empty/null, and fs.s3a.endpoint.region is null, the region is set to "us-east-1". * If fs.s3a.endpoint.region is explicitly set to "" then the client falls back to the SDK region resolution chain; this works on EC2 * Details in troubleshooting.md, including a workaround for Hadoop-3.3.1+ * Also contains some minor restructuring of troubleshooting.md * uses pre-Auditing LogExactlyOnce import, so doesn't depend on that patch. Contributed by Steve Loughran. This is a critical follow on patch to CDPD-26441. HADOOP-17705. S3A to add Config to set AWS region (apache#3020) Both patches must be included Change-Id: Icca928e1752423d68591508c360ff6434997fb64
Contributed by Steve Loughran.
Change-Id: I94284178d27a48947e7c0942a7c8565379de7e9b