Skip to content

HADOOP-18908. Improve S3A region handling. #6187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Oct 13, 2023

Description of PR

#6106 with one extra commit

Removes the region probe to S3 and reinstates the region resolution logic that was there before the SDK V2 upgrade.

  • Region probe is no longer needed as SDKV2 now supports cross region support.
  • If endpoint is configured and it is not the central (s3.amazonaws.com) one, parse the endpoint for the region. If unable to
    parse region from endpoint (or endpoint is null), look at the the region configuration.
  • If region configured, set it in the client. If it is null, use us-east-2 and set crossRegionAccessEnabled=true
  • If config is empty string, use SDK region resolution chain.

How was this patch tested?

test in progress with s3 london, -Dparallel-tests -DtestsThreadCount=10
expecting vpce test failure.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@steveloughran
Copy link
Contributor Author

3 failures; one from this, one that seems caused by #6141 and a random timeout

[ERROR] Failures: 
[ERROR]   ITestS3AEndpointRegion.testWithVPCE:215->lambda$testWithVPCE$7:215 [Incorrect region set] expected:<"[us-west-2]"> but was:<"[vpce]">
[ERROR]   ITestS3GuardTool>AbstractS3GuardToolTestBase.testToolsNoBucket:222 Expected a org.apache.hadoop.fs.s3a.UnknownStoreException to be thrown, but got the result: : 0
[ERROR] Errors: 
[ERROR]   ITestS3APrefetchingLruEviction.testSeeksWithLruEviction:162 » TestTimedOut tes...
[INFO] 

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 21m 10s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 16m 11s Maven dependency ordering for branch
+1 💚 mvninstall 37m 21s trunk passed
-1 ❌ compile 13m 56s /branch-compile-root-jdkUbuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04.txt root in trunk failed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04.
+1 💚 compile 19m 3s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 4m 57s trunk passed
+1 💚 mvnsite 2m 44s trunk passed
+1 💚 javadoc 1m 49s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 35s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 4m 15s trunk passed
+1 💚 shadedclient 40m 54s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 29s Maven dependency ordering for patch
+1 💚 mvninstall 1m 42s the patch passed
+1 💚 compile 19m 40s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 19m 40s the patch passed
+1 💚 compile 17m 27s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 17m 27s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 35s /results-checkstyle-root.txt root: The patch generated 1 new + 11 unchanged - 0 fixed = 12 total (was 11)
+1 💚 mvnsite 2m 28s the patch passed
+1 💚 javadoc 1m 40s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 28s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 4m 22s the patch passed
+1 💚 shadedclient 40m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 20m 25s hadoop-common in the patch passed.
+1 💚 unit 3m 19s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 9s The patch does not generate ASF License warnings.
290m 17s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6187/1/artifact/out/Dockerfile
GITHUB PR #6187
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux a19aa253b2ad 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 91462c3
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6187/1/testReport/
Max. process+thread count 1946 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6187/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

proposed: the vpce test is skipped until HADOOP-18938; will see what the others do.
also need to handle the failure of

Change-Id: I71b15fbdef658ea915a7a93fd17a40e9d78428c6
* ignore failing endpoint test pending followup JIRA.
* S3guard bucket-info command to fall back to listXattrs(/)
  if getBucketInfo fails, so as to force a bucket lookup even
  if the region can be inferred from the endpoint.

Change-Id: I1fbcc5371cb13f1d34ec371fd4c29f106aa4a062
@steveloughran
Copy link
Contributor Author

  • compile failure was yarn (transient?)
  • checkstyle on builder method header > 100 bytes which needs to be that long
  • turned off failing endpoint vpce test
  • fixed bucket-info to do a HEAD / if getBucketLocation fails for any reason

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 16m 2s Maven dependency ordering for branch
+1 💚 mvninstall 37m 22s trunk passed
+1 💚 compile 18m 32s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 compile 16m 49s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 4m 40s trunk passed
+1 💚 mvnsite 2m 31s trunk passed
+1 💚 javadoc 1m 44s trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 31s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 3m 47s trunk passed
+1 💚 shadedclient 38m 48s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 30s the patch passed
+1 💚 compile 17m 38s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javac 17m 38s the patch passed
+1 💚 compile 16m 44s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 16m 44s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 35s /results-checkstyle-root.txt root: The patch generated 1 new + 11 unchanged - 0 fixed = 12 total (was 11)
+1 💚 mvnsite 2m 27s the patch passed
+1 💚 javadoc 1m 51s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 39s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 4m 34s the patch passed
+1 💚 shadedclient 41m 20s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 25s hadoop-common in the patch passed.
+1 💚 unit 2m 59s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 58s The patch does not generate ASF License warnings.
265m 48s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6187/2/artifact/out/Dockerfile
GITHUB PR #6187
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 802fa17fa507 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 02165f1
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6187/2/testReport/
Max. process+thread count 1246 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6187/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran merged commit e0563fe into apache:trunk Oct 17, 2023
ahmarsuhail pushed a commit to ahmarsuhail/hadoop that referenced this pull request Nov 27, 2023
S3A region logic improved for better inference and
to be compatible with previous releases

1. If you are using an AWS S3 AccessPoint, its region is determined
   from the ARN itself.
2. If fs.s3a.endpoint.region is set and non-empty, it is used.
3. If fs.s3a.endpoint is an s3.*.amazonaws.com url, 
   the region is determined by by parsing the URL 
   Note: vpce endpoints are not handled by this.
4. If fs.s3a.endpoint.region==null, and none could be determined
   from the endpoint, use us-east-2 as default.
5. If fs.s3a.endpoint.region=="" then it is handed off to
   The default AWS SDK resolution process.

Consult the AWS SDK documentation for the details on its resolution
process, knowing that it is complicated and may use environment variables,
entries in ~/.aws/config, IAM instance information within
EC2 deployments and possibly even JSON resources on the classpath.
Put differently: it is somewhat brittle across deployments.

Contributed by Ahmar Suhail
ahmarsuhail pushed a commit to ahmarsuhail/hadoop that referenced this pull request Dec 5, 2023
S3A region logic improved for better inference and
to be compatible with previous releases

1. If you are using an AWS S3 AccessPoint, its region is determined
   from the ARN itself.
2. If fs.s3a.endpoint.region is set and non-empty, it is used.
3. If fs.s3a.endpoint is an s3.*.amazonaws.com url, 
   the region is determined by by parsing the URL 
   Note: vpce endpoints are not handled by this.
4. If fs.s3a.endpoint.region==null, and none could be determined
   from the endpoint, use us-east-2 as default.
5. If fs.s3a.endpoint.region=="" then it is handed off to
   The default AWS SDK resolution process.

Consult the AWS SDK documentation for the details on its resolution
process, knowing that it is complicated and may use environment variables,
entries in ~/.aws/config, IAM instance information within
EC2 deployments and possibly even JSON resources on the classpath.
Put differently: it is somewhat brittle across deployments.

Contributed by Ahmar Suhail
ahmarsuhail pushed a commit to ahmarsuhail/hadoop that referenced this pull request Dec 5, 2023
S3A region logic improved for better inference and
to be compatible with previous releases

1. If you are using an AWS S3 AccessPoint, its region is determined
   from the ARN itself.
2. If fs.s3a.endpoint.region is set and non-empty, it is used.
3. If fs.s3a.endpoint is an s3.*.amazonaws.com url, 
   the region is determined by by parsing the URL 
   Note: vpce endpoints are not handled by this.
4. If fs.s3a.endpoint.region==null, and none could be determined
   from the endpoint, use us-east-2 as default.
5. If fs.s3a.endpoint.region=="" then it is handed off to
   The default AWS SDK resolution process.

Consult the AWS SDK documentation for the details on its resolution
process, knowing that it is complicated and may use environment variables,
entries in ~/.aws/config, IAM instance information within
EC2 deployments and possibly even JSON resources on the classpath.
Put differently: it is somewhat brittle across deployments.

Contributed by Ahmar Suhail
jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
S3A region logic improved for better inference and
to be compatible with previous releases

1. If you are using an AWS S3 AccessPoint, its region is determined
   from the ARN itself.
2. If fs.s3a.endpoint.region is set and non-empty, it is used.
3. If fs.s3a.endpoint is an s3.*.amazonaws.com url, 
   the region is determined by by parsing the URL 
   Note: vpce endpoints are not handled by this.
4. If fs.s3a.endpoint.region==null, and none could be determined
   from the endpoint, use us-east-2 as default.
5. If fs.s3a.endpoint.region=="" then it is handed off to
   The default AWS SDK resolution process.

Consult the AWS SDK documentation for the details on its resolution
process, knowing that it is complicated and may use environment variables,
entries in ~/.aws/config, IAM instance information within
EC2 deployments and possibly even JSON resources on the classpath.
Put differently: it is somewhat brittle across deployments.

Contributed by Ahmar Suhail
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants