Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17771. S3AFS creation fails "Unable to find a region via the region provider chain." #3133

Merged
merged 7 commits into from
Jun 24, 2021

Conversation

steveloughran
Copy link
Contributor

Contributed by Steve Loughran.

Change-Id: I94284178d27a48947e7c0942a7c8565379de7e9b

Contributed by Steve Loughran.

Change-Id: I94284178d27a48947e7c0942a7c8565379de7e9b
@steveloughran
Copy link
Contributor Author

Testing in progress, in a setup where

  • the failure mode is replicated
  • fs.s3a.endpoint is not set globally or for the test bucket.

@steveloughran
Copy link
Contributor Author

cloudstore regions

This verifies that the local system is set up to fail without the patch.

> bin/hadoop jar $CLOUDSTORE regions

Determining AWS region for SDK clients
======================================


Determining region using AwsEnvVarOverrideRegionProvider
========================================================

Use environment variable AWS_REGION
2021-06-22 13:12:20,509 [main] INFO  extra.Regions (StoreDurationInfo.java:<init>(53)) - Starting: AwsEnvVarOverrideRegionProvider.getRegion()
2021-06-22 13:12:20,512 [main] INFO  extra.Regions (StoreDurationInfo.java:close(100)) - AwsEnvVarOverrideRegionProvider.getRegion(): duration 0:00:004
region is not known

Determining region using AwsSystemPropertyRegionProvider
========================================================

System property aws.region
2021-06-22 13:12:20,513 [main] INFO  extra.Regions (StoreDurationInfo.java:<init>(53)) - Starting: AwsSystemPropertyRegionProvider.getRegion()
2021-06-22 13:12:20,513 [main] INFO  extra.Regions (StoreDurationInfo.java:close(100)) - AwsSystemPropertyRegionProvider.getRegion(): duration 0:00:000
region is not known

Determining region using AwsProfileRegionProvider
=================================================

Region info in ~/.aws/config
2021-06-22 13:12:20,535 [main] INFO  extra.Regions (StoreDurationInfo.java:<init>(53)) - Starting: AwsProfileRegionProvider.getRegion()
2021-06-22 13:12:20,550 [main] INFO  extra.Regions (StoreDurationInfo.java:close(100)) - AwsProfileRegionProvider.getRegion(): duration 0:00:015
region is not known

Determining region using InstanceMetadataRegionProvider
=======================================================

EC2 metadata; will only work in AWS infrastructure
2021-06-22 13:12:20,551 [main] INFO  extra.Regions (StoreDurationInfo.java:<init>(53)) - Starting: InstanceMetadataRegionProvider.getRegion()
2021-06-22 13:12:20,552 [main] INFO  extra.Regions (StoreDurationInfo.java:close(100)) - InstanceMetadataRegionProvider.getRegion(): duration 0:00:001
WARNING: Provider raised an exception com.amazonaws.AmazonClientException: AWS_EC2_METADATA_DISABLED is set to true, not loading region from EC2 Instance Metadata service
region is not known

Region was NOT FOUND
====================

WARNING: AWS region was not determined through SDK region chain
WARNING: This may not work
2021-06-22 13:12:20,554 [main] INFO  util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 50: 

hadoop fs -ls without patch

 bin/hadoop fs -ls s3a://stevel-usw2/
 ...
 ls: initializing  on s3a://stevel-usw2/: com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region.: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region.
 

hadoop fs -ls with patch

(+iostats to log @ info; auditor logging enabled)

2021-06-22 13:16:51,907 [s3a-transfer-stevel-usw2-unbounded-pool2-t1] DEBUG impl.LoggingAuditor (LoggingAuditor.java:beforeExecution(327)) - [16] 9f36cb47-8dea-4e46-8e5a-e4b5ed3c0304-00000007 Executing op_list_status with {object_list_request '' size=5000, mutating=false}; https://audit.example.org/hadoop/1/op_list_status/9f36cb47-8dea-4e46-8e5a-e4b5ed3c0304-00000007/?op=op_list_status&pr=stevel&ps=3878272c-5b04-487e-be1f-0562bc7e4b21&cm=FsShell&id=9f36cb47-8dea-4e46-8e5a-e4b5ed3c0304-00000007&t0=1&fs=9f36cb47-8dea-4e46-8e5a-e4b5ed3c0304&t1=16&ts=1624364211897
2021-06-22 13:16:53,469 [shutdown-hook-0] INFO  statistics.IOStatisticsLogging (IOStatisticsLogging.java:logIOStatisticsAtLevel(269)) - IOStatistics: counters=((audit_request_execution=1)
(audit_span_creation=4)
(object_list_request=1)
(op_get_file_status=1)
(op_glob_status=1)
(op_list_status=1)
(store_io_request=2));

gauges=();

minimums=((object_list_request.min=1550)
(op_get_file_status.min=2)
(op_glob_status.min=8)
(op_list_status.min=1563));

maximums=((object_list_request.max=1550)
(op_get_file_status.max=2)
(op_glob_status.max=8)
(op_list_status.max=1563));

means=((object_list_request.mean=(samples=1, sum=1550, mean=1550.0000))
(op_get_file_status.mean=(samples=1, sum=2, mean=2.0000))
(op_glob_status.mean=(samples=1, sum=8, mean=8.0000))

The list status went through

Storediag with patch

And trimmed storediag; note how bucket HEAD returns the region along with the 401 x-amz-bucket-region: us-west-2. this is how the SDK determines which real endpoint to use.

Store Diagnostics for stevel (auth:SIMPLE) on stevel-mbp15-13176.local/192.168.86.23
====================================================================================


Diagnostics for filesystem s3a://stevel-usw2/
=============================================

S3A FileSystem Connector
ASF Filesystem Connector to Amazon S3 Storage and compatible stores
https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html

Hadoop information
==================

  Hadoop 3.4.0-SNAPSHOT
  Compiled by stevel on 2021-05-26T16:24Z
  Compiled with protoc 3.7.1
  From source with checksum a6a33d523c6f35e2a28e9a9ecf703a27

Determining OS version
======================

Darwin stevel-mbp15-13176.local 20.5.0 Darwin Kernel Version 20.5.0: Sat May  8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64 x86_64

Selected System Properties
==========================

aws.accessKeyId = (unset)
aws.secretKey = (unset)
aws.sessionToken = (unset)
aws.region = (unset)
com.amazonaws.regions.RegionUtils.fileOverride = (unset)
com.amazonaws.regions.RegionUtils.disableRemote = (unset)
com.amazonaws.sdk.disableCertChecking = (unset)
com.amazonaws.sdk.ec2MetadataServiceEndpointOverride = (unset)
com.amazonaws.sdk.enableDefaultMetrics = (unset)
com.amazonaws.sdk.enableInRegionOptimizedMode = (unset)
com.amazonaws.sdk.enableThrottledRetry = (unset)
com.amazonaws.services.s3.disableImplicitGlobalClients = (unset)
com.amazonaws.services.s3.enableV4 = (unset)
com.amazonaws.services.s3.enforceV4 = (unset)

Environment Variables
=====================

AWS_ACCESS_KEY_ID = (unset)
AWS_ACCESS_KEY = (unset)
AWS_SECRET_KEY = (unset)
AWS_SECRET_ACCESS_KEY = (unset)
AWS_SESSION_TOKEN = (unset)
AWS_REGION = (unset)
AWS_S3_US_EAST_1_REGIONAL_ENDPOINT = (unset)
AWS_CBOR_DISABLE = (unset)
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI = (unset)
AWS_CONTAINER_CREDENTIALS_FULL_URI = (unset)
AWS_CONTAINER_AUTHORIZATION_TOKEN = (unset)
AWS_EC2_METADATA_DISABLED = "true"
AWS_EC2_METADATA_SERVICE_ENDPOINT = (unset)
AWS_MAX_ATTEMPTS = (unset)
AWS_RETRY_MODE = (unset)
HADOOP_CONF_DIR = "/Users/stevel/Projects/Releases/hadoop-3.4.0-SNAPSHOT/etc/hadoop"
HADOOP_CREDSTORE_PASSWORD = (unset)
HADOOP_HEAPSIZE = (unset)
HADOOP_HEAPSIZE_MIN = (unset)
HADOOP_HOME = "/Users/stevel/Projects/Releases/hadoop-3.4.0-SNAPSHOT"
HADOOP_LOG_DIR = (unset)
HADOOP_OPTIONAL_TOOLS = "hadoop-azure,hadoop-aws,hadoop-openstack"
HADOOP_OPTS = "-Djava.net.preferIPv4Stack=true  -Dyarn.log.dir=/Users/stevel/Projects/Releases/hadoop-3.4.0-SNAPSHOT/logs -Dyarn.log.file=hadoop.log -Dyarn.home.dir=/Users/stevel/Projects/Releases/hadoop-3.4.0-SNAPSHOT -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/Users/stevel/Projects/Releases/hadoop-3.4.0-SNAPSHOT/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/Users/stevel/Projects/Releases/hadoop-3.4.0-SNAPSHOT -Dhadoop.id.str=stevel -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender"
HADOOP_SHELL_SCRIPT_DEBUG = (unset)
HADOOP_TOKEN = (unset)
HADOOP_TOKEN_FILE_LOCATION = (unset)
HADOOP_TOOLS_HOME = (unset)
HADOOP_TOOLS_OPTIONS = (unset)
HDP_VERSION = (unset)
LOCAL_DIRS = (unset)
PYSPARK_DRIVER_PYTHON = (unset)
SPARK_HOME = (unset)
SPARK_CONF_DIR = (unset)
SPARK_SCALA_VERSION = (unset)
YARN_CONF_DIR = (unset)

Security
========

Security Enabled: false
Keytab login: false
Ticket login: false
Current user: stevel (auth:SIMPLE)
Token count: 0

Hadoop Options
==============

fs.defaultFS = "file:///" [core-default.xml]
fs.default.name = "file:///"
fs.trash.classname = (unset)
fs.trash.interval = "0" [core-default.xml]
fs.trash.checkpoint.interval = "0" [core-default.xml]
hadoop.tmp.dir = "/tmp/hadoop-stevel" [core-default.xml]
hdp.version = (unset)
yarn.resourcemanager.address = "0.0.0.0:8032" [yarn-default.xml]
yarn.resourcemanager.principal = (unset)
yarn.resourcemanager.webapp.address = "0.0.0.0:8088" [yarn-default.xml]
yarn.resourcemanager.webapp.https.address = "0.0.0.0:8090" [yarn-default.xml]
mapreduce.input.fileinputformat.list-status.num-threads = "1" [mapred-default.xml]
mapreduce.jobtracker.kerberos.principal = (unset)
mapreduce.job.hdfs-servers.token-renewal.exclude = (unset)
mapreduce.application.framework.path = (unset)
fs.iostatistics.logging.level = "info" [core-site.xml]

Security Options
================

dfs.data.transfer.protection = (unset)
hadoop.http.authentication.simple.anonymous.allowed = "true" [core-default.xml]
hadoop.http.authentication.type = "simple" [core-default.xml]
hadoop.kerberos.min.seconds.before.relogin = "60" [core-default.xml]
hadoop.kerberos.keytab.login.autorenewal.enabled = "false" [core-default.xml]
hadoop.security.authentication = "simple" [core-default.xml]
hadoop.security.authorization = "false" [core-default.xml]
hadoop.security.credential.provider.path = (unset)
hadoop.security.credstore.java-keystore-provider.password-file = (unset)
hadoop.security.credential.clear-text-fallback = "true" [core-default.xml]
hadoop.security.key.provider.path = (unset)
hadoop.security.crypto.jceks.key.serialfilter = (unset)
hadoop.rpc.protection = "authentication" [core-default.xml]
hadoop.tokens = (unset)
hadoop.token.files = (unset)

Selected Configuration Options
==============================

fs.s3a.session.token = (unset)
fs.s3a.server-side-encryption-algorithm = (unset)
fs.s3a.server-side-encryption.key = (unset)
fs.s3a.aws.credentials.provider = "
    org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider,
    org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider,
    com.amazonaws.auth.EnvironmentVariableCredentialsProvider,
    org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider
  " [core-default.xml]
fs.s3a.endpoint = (unset)
fs.s3a.endpoint.region = (unset)
fs.s3a.signing-algorithm = (unset)
fs.s3a.acl.default = (unset)
fs.s3a.attempts.maximum = "20" [core-default.xml]
fs.s3a.authoritative.path = "/" [fs.s3a.bucket.stevel-usw2.authoritative.path via [core-site.xml]]
fs.s3a.block.size = "32M" [core-default.xml]
fs.s3a.buffer.dir = "/tmp/hadoop-stevel/s3a" [core-default.xml]
fs.s3a.bulk.delete.page.size = (unset)
fs.s3a.change.detection.source = "etag" [core-default.xml]
fs.s3a.change.detection.mode = "server" [core-default.xml]
fs.s3a.change.detection.version.required = "true" [core-default.xml]
fs.s3a.connection.ssl.enabled = "true" [core-default.xml]
fs.s3a.connection.maximum = "48" [core-default.xml]
fs.s3a.connection.establish.timeout = "5000" [core-site.xml]
fs.s3a.connection.request.timeout = "0" [core-default.xml]
fs.s3a.connection.timeout = "5000" [core-site.xml]
fs.s3a.custom.signers = (unset)
fs.s3a.directory.marker.retention = (unset)
fs.s3a.downgrade.syncable.exceptions = (unset)
fs.s3a.etag.checksum.enabled = "false" [core-default.xml]
fs.s3a.experimental.input.fadvise = (unset)
fs.s3a.experimental.aws.s3.throttling = (unset)
fs.s3a.experimental.optimized.directory.operations = (unset)
fs.s3a.fast.buffer.size = (unset)
fs.s3a.fast.upload.buffer = "bytebuffer" [core-site.xml]
fs.s3a.fast.upload.active.blocks = "4" [core-default.xml]
fs.s3a.impl.disable.cache = (unset)
fs.s3a.list.version = "2" [core-default.xml]
fs.s3a.max.total.tasks = "32" [core-default.xml]
fs.s3a.multipart.size = "8388608" [core-site.xml]
fs.s3a.paging.maximum = "5000" [core-default.xml]
fs.s3a.multiobjectdelete.enable = "true" [core-default.xml]
fs.s3a.multipart.purge = "false" [core-site.xml]
fs.s3a.multipart.purge.age = "3600000" [core-site.xml]
fs.s3a.paging.maximum = "5000" [core-default.xml]
fs.s3a.path.style.access = "false" [core-site.xml]
fs.s3a.proxy.host = (unset)
fs.s3a.proxy.port = (unset)
fs.s3a.proxy.username = (unset)
fs.s3a.proxy.password = (unset)
fs.s3a.proxy.domain = (unset)
fs.s3a.proxy.workstation = (unset)
fs.s3a.readahead.range = "524288" [core-site.xml]
fs.s3a.retry.limit = "7" [core-default.xml]
fs.s3a.retry.interval = "500ms" [core-default.xml]
fs.s3a.retry.throttle.limit = "20" [core-default.xml]
fs.s3a.retry.throttle.interval = "100ms" [core-default.xml]
fs.s3a.ssl.channel.mode = "default" [fs.s3a.bucket.stevel-usw2.ssl.channel.mode via [core-site.xml]]
fs.s3a.s3.client.factory.impl = (unset)
fs.s3a.threads.max = "80" [core-site.xml]
fs.s3a.threads.keepalivetime = "60" [core-default.xml]
fs.s3a.user.agent.prefix = (unset)
fs.s3a.metadatastore.impl = "org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore" [core-default.xml]
fs.s3a.metadatastore.authoritative = "false" [core-default.xml]
fs.s3a.metadatastore.authoritative.dir.ttl = (unset)
fs.s3a.metadatastore.fail.on.write.error = "true" [core-default.xml]
fs.s3a.metadatastore.metadata.ttl = "15m" [core-default.xml]
fs.s3a.s3guard.consistency.retry.interval = "2s" [core-default.xml]
fs.s3a.s3guard.consistency.retry.limit = "7" [core-default.xml]
fs.s3a.s3guard.ddb.table = (unset)
fs.s3a.s3guard.ddb.region = "eu-west-2" [core-site.xml]
fs.s3a.s3guard.ddb.background.sleep = "25ms" [core-default.xml]
fs.s3a.s3guard.ddb.max.retries = "9" [core-default.xml]
fs.s3a.s3guard.ddb.table.capacity.read = "0" [core-default.xml]
fs.s3a.s3guard.ddb.table.capacity.write = "0" [core-default.xml]
fs.s3a.s3guard.ddb.table.create = "false" [core-default.xml]
fs.s3a.s3guard.ddb.throttle.retry.interval = "100ms" [core-default.xml]
fs.s3a.s3guard.local.max_records = (unset)
fs.s3a.s3guard.local.ttl = (unset)
fs.s3a.committer.name = "magic" [core-site.xml]
fs.s3a.committer.magic.enabled = "true" [core-default.xml]
fs.s3a.committer.staging.abort.pending.uploads = (unset)
fs.s3a.committer.staging.conflict-mode = "append" [core-default.xml]
fs.s3a.committer.staging.tmp.path = "tmp/staging" [core-default.xml]
fs.s3a.committer.threads = "8" [core-default.xml]
fs.s3a.committer.staging.unique-filenames = "false" [core-site.xml]
mapreduce.outputcommitter.factory.scheme.s3a = "org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory" [mapred-default.xml]
mapreduce.fileoutputcommitter.marksuccessfuljobs = (unset)
fs.s3a.delegation.token.binding = (unset)
fs.s3a.delegation.token.secondary.bindings = (unset)
fs.s3a.audit.referrer.enabled = (unset)
fs.s3a.audit.referrer.filter = (unset)
fs.s3a.audit.reject.out.of.span.operations = "true" [core-site.xml]
fs.s3a.audit.request.handlers = (unset)
fs.s3a.audit.service.classname = (unset)




Endpoints
=========

Attempting to list and connect to public service endpoints,
without any authentication credentials.
This is just testing the reachability of the URLs.
If the request fails with any network error it is likely
to be configuration problem with address, proxy, etc

If it is some authentication error, then don't worry so much
-look for the results of the filesystem operations

Endpoint: https://stevel-usw2.s3.amazonaws.com/
===============================================

Canonical hostname s3-us-west-2-w.amazonaws.com
  IP address 52.218.253.131
Proxy: none

Connecting to https://stevel-usw2.s3.amazonaws.com/

Response: 403 : Forbidden
HTTP response 403 from https://stevel-usw2.s3.amazonaws.com/: Forbidden
Using proxy: false
Transfer-Encoding: chunked
null: HTTP/1.1 403 Forbidden
Server: AmazonS3
x-amz-request-id: FFH2R0SM4MM8SZR2
x-amz-id-2: nRlKBuQRLD+7AOPpbsRzvASmnZUa4iZSLuqNQbKfyCjnTb3A53jNIIemlR2/ZsZQaveGyJ69p7Y=
Date: Tue, 22 Jun 2021 12:18:43 GMT
x-amz-bucket-region: us-west-2
Content-Type: application/xml

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>FFH2R0SM4MM8SZR2</RequestId><HostId>nRlKBuQRLD+7AOPpbsRzvASmnZUa4iZSLuqNQbKfyCjnTb3A53jNIIemlR2/ZsZQaveGyJ69p7Y=</HostId></Error>

Test filesystem s3a://stevel-usw2/
==================================

Trying some list and read operations
2021-06-22 13:18:45,526 [main] INFO  diag.StoreDiag (StoreDurationInfo.java:<init>(53)) - Starting: Creating filesystem s3a://stevel-usw2/
2021-06-22 13:18:47,017 [main] INFO  diag.StoreDiag (StoreDurationInfo.java:close(100)) - Creating filesystem s3a://stevel-usw2/: duration 0:01:492
S3AFileSystem{uri=s3a://stevel-usw2, workingDir=s3a://stevel-usw2/user/stevel, inputPolicy=normal, partSize=8388608, enableMultiObjectsDelete=true, maxKeys=5000, readAhead=524288, blockSize=33554432, multiPartThreshold=134217728, serverSideEncryptionAlgorithm='NONE', blockFactory=ByteBufferBlockFactory{buffersOutstanding=0}, auditManager=Service ActiveAuditManagerS3A in state ActiveAuditManagerS3A: STARTED, auditor=LoggingAuditor{ID='1cdd7da8-c418-40c5-ae44-b0398d1b98c5', headerEnabled=true, rejectOutOfSpan=true}}, metastore=NullMetadataStore, authoritativeStore=false, authoritativePath=[s3a://stevel-usw2/], useListV1=false, magicCommitter=true, boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=192, available=192, waiting=0}, activeCount=0}, unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@1b5bc39d[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], credentials=AWSCredentialProviderList[refcount= 1: [TemporaryAWSCredentialsProvider, SimpleAWSCredentialsProvider, EnvironmentVariableCredentialsProvider, org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider@655a5d9c], delegation tokens=disabled, DirectoryMarkerRetention{policy='delete'}, instrumentation {S3AInstrumentation{}}}
Implementation class class org.apache.hadoop.fs.s3a.S3AFileSystem
2021-06-22 13:18:47,019 [main] INFO  diag.StoreDiag (StoreDurationInfo.java:<init>(53)) - Starting: GetFileStatus s3a://stevel-usw2/
root entry S3AFileStatus{path=s3a://stevel-usw2/; isDirectory=true; modification_time=0; access_time=0; owner=stevel; group=stevel; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=UNKNOWN eTag=null versionId=null
2021-06-22 13:18:47,026 [main] INFO  diag.StoreDiag (StoreDurationInfo.java:close(100)) - GetFileStatus s3a://stevel-usw2/: duration 0:00:007
2021-06-22 13:18:47,026 [main] INFO  diag.StoreDiag (StoreDurationInfo.java:<init>(53)) - Starting: First 25 entries of listStatus(s3a://stevel-usw2/)
2021-06-22 13:18:47,040 [s3a-transfer-stevel-usw2-unbounded-pool2-t1] DEBUG impl.LoggingAuditor (LoggingAuditor.java:beforeExecution(327)) - [22] 1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000006 Executing op_list_status with {object_list_request '' size=5000, mutating=false}; https://audit.example.org/hadoop/1/op_list_status/1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000006/?op=op_list_status&pr=stevel&ps=1ac8a7d7-8609-4698-811a-bf617bbe594b&cm=StoreDiag&id=1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000006&t0=1&fs=1cdd7da8-c418-40c5-ae44-b0398d1b98c5&t1=22&ts=1624364327026
s3a://stevel-usw2/ : scanned 0 entries
2021-06-22 13:18:48,830 [main] INFO  diag.StoreDiag (StoreDurationInfo.java:close(100)) - First 25 entries of listStatus(s3a://stevel-usw2/): duration 0:01:804
2021-06-22 13:18:48,830 [main] INFO  diag.StoreDiag (StoreDurationInfo.java:<init>(53)) - Starting: First 25 entries of listFiles(s3a://stevel-usw2/)
2021-06-22 13:18:48,835 [s3a-transfer-stevel-usw2-unbounded-pool2-t2] DEBUG impl.LoggingAuditor (LoggingAuditor.java:beforeExecution(327)) - [23] 1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000007 Executing op_list_files with {object_list_request '' size=5000, mutating=false}; https://audit.example.org/hadoop/1/op_list_files/1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000007/?op=op_list_files&pr=stevel&ps=1ac8a7d7-8609-4698-811a-bf617bbe594b&cm=StoreDiag&id=1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000007&t0=1&fs=1cdd7da8-c418-40c5-ae44-b0398d1b98c5&t1=23&ts=1624364328832
Files listing provided by: FunctionRemoteIterator{FileStatusListingIterator[Object listing iterator against s3a://stevel-usw2/; listing count 1; isTruncated=false; counters=((object_list_request=1) (object_list_request.failures=0) (object_continue_list_request.failures=0) (object_continue_list_request=0));
gauges=();
minimums=((object_list_request.failures.min=-1) (object_continue_list_request.min=-1) (object_list_request.min=185) (object_continue_list_request.failures.min=-1));
maximums=((object_list_request.failures.max=-1) (object_continue_list_request.failures.max=-1) (object_continue_list_request.max=-1) (object_list_request.max=185));
means=((object_list_request.mean=(samples=1, sum=185, mean=185.0000)) (object_list_request.failures.mean=(samples=0, sum=0, mean=0.0000)) (object_continue_list_request.mean=(samples=0, sum=0, mean=0.0000)) (object_continue_list_request.failures.mean=(samples=0, sum=0, mean=0.0000)));
]}
2021-06-22 13:18:49,029 [main] INFO  diag.StoreDiag (StoreDurationInfo.java:close(100)) - First 25 entries of listFiles(s3a://stevel-usw2/): duration 0:00:199

Security and Delegation Tokens
==============================

Security is disabled
Filesystem s3a://stevel-usw2 does not/is not configured to issue delegation tokens (at least while security is disabled)
2021-06-22 13:18:49,030 [main] INFO  diag.StoreDiag (StoreDurationInfo.java:<init>(53)) - Starting: probe for a directory which does not yet exist s3a://stevel-usw2/dir-d2dc8c76-a357-418a-8bc3-6d45d256c7d9
2021-06-22 13:18:49,034 [main] DEBUG impl.LoggingAuditor (LoggingAuditor.java:beforeExecution(327)) - [1] 1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000008 Executing op_get_file_status with {action_http_head_request 'dir-d2dc8c76-a357-418a-8bc3-6d45d256c7d9' size=0, mutating=false}; https://audit.example.org/hadoop/1/op_get_file_status/1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000008/?op=op_get_file_status&p1=dir-d2dc8c76-a357-418a-8bc3-6d45d256c7d9&pr=stevel&ps=1ac8a7d7-8609-4698-811a-bf617bbe594b&cm=StoreDiag&id=1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000008&t0=1&fs=1cdd7da8-c418-40c5-ae44-b0398d1b98c5&t1=1&ts=1624364329030
2021-06-22 13:18:49,277 [main] DEBUG impl.LoggingAuditor (LoggingAuditor.java:beforeExecution(327)) - [1] 1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000008 Executing op_get_file_status with {object_list_request 'dir-d2dc8c76-a357-418a-8bc3-6d45d256c7d9/' size=2, mutating=false}; https://audit.example.org/hadoop/1/op_get_file_status/1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000008/?op=op_get_file_status&p1=dir-d2dc8c76-a357-418a-8bc3-6d45d256c7d9&pr=stevel&ps=1ac8a7d7-8609-4698-811a-bf617bbe594b&cm=StoreDiag&id=1cdd7da8-c418-40c5-ae44-b0398d1b98c5-00000008&t0=1&fs=1cdd7da8-c418-40c5-ae44-b0398d1b98c5&t1=1&ts=1624364329030
2021-06-22 13:18:49,465 [main] INFO  diag.StoreDiag (StoreDurationInfo.java:close(100)) - probe for a directory which does not yet exist s3a://stevel-usw2/dir-d2dc8c76-a357-418a-8bc3-6d45d256c7d9: duration 0:00:436
Tests ar read only
JVM: memory=127129952
2021-06-22 13:18:49,474 [shutdown-hook-0] INFO  statistics.IOStatisticsLogging (IOStatisticsLogging.java:logIOStatisticsAtLevel(269)) - IOStatistics: counters=((action_http_head_request=1)
(audit_request_execution=4)
(audit_span_creation=5)
(object_list_request=3)
(object_metadata_request=1)
(op_get_file_status=2)
(op_get_file_status.failures=1)
(op_list_files=1)
(op_list_status=1)
(store_io_request=5));

gauges=();

minimums=((action_http_head_request.min=241)
(object_list_request.min=185)
(op_get_file_status.failures.min=435)
(op_get_file_status.min=3)
(op_list_files.min=189)
(op_list_status.min=1801));

maximums=((action_http_head_request.max=241)
(object_list_request.max=1787)
(op_get_file_status.failures.max=435)
(op_get_file_status.max=3)
(op_list_files.max=189)
(op_list_status.max=1801));

means=((action_http_head_request.mean=(samples=1, sum=241, mean=241.0000))
(object_list_request.mean=(samples=3, sum=2157, mean=719.0000))
(op_get_file_status.failures.mean=(samples=1, sum=435, mean=435.0000))
(op_get_file_status.mean=(samples=1, sum=3, mean=3.0000))
(op_list_files.mean=(samples=1, sum=189, mean=189.0000))
(op_list_status.mean=(samples=1, sum=1801, mean=1801.0000)))

@steveloughran steveloughran added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Jun 22, 2021
@steveloughran
Copy link
Contributor Author

  • No new tests here. The only way to test would be to destroy ~/.aws/config and & skip if the env vars/syssprops resolved or running in EC2. If we could set the resolve chain I'd have patched that to always return null...

@steveloughran
Copy link
Contributor Author

Test run with -Dparallel-tests -DtestsThreadCount=7 -Dmarkers=keep -no failures!

rerunning with dynamo & scale

Copy link
Contributor

@bogthe bogthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great catch! Can't imagine the confusion when debugging issues like this.

Code looks good.

@steveloughran
Copy link
Contributor Author

tested with -Dparallel-tests -DtestsThreadCount=7 -Dmarkers=keep -Dscale -Ds3guard -Ddynamo

transient read buffer underflow failure; one extra S3Guard write than expected. Both of those surface when there are too many records

[ERROR] Failures:
[ERROR]   ITestS3AContractUnbuffer>AbstractContractUnbufferTest.testUnbufferOnClosedFile:83->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:647->Assert.failNotEquals:835->Assert.fail:89 failed to read expected number of bytes from stream. This may be transient expected:<1024> but was:<515>
[ERROR]   ITestCommitOperations.testBulkCommitFiles:723->Assert.assertEquals:647->Assert.failNotEquals:835->Assert.fail:89 Number of records written after commit #2; first commit had 4; first commit ancestors CommitContext{operationState=AncestorState{operation=Commitid=44; dest=s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME/testBulkCommitFiles/out; size=6; paths={s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME/testBulkCommitFiles/out s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME/testBulkCommitFiles/out/file1 s3a://stevel-london/fork-0007 s3a://stevel-london/fork-0007/test s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME/testBulkCommitFiles s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME}}}; second commit ancestors: CommitContext{operationState=AncestorState{operation=Commitid=44; dest=s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME/testBulkCommitFiles/out; size=8; paths={s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME/testBulkCommitFiles/out s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME/testBulkCommitFiles/out/file1 s3a://stevel-london/fork-0007 s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME/testBulkCommitFiles/out/subdir s3a://stevel-london/fork-0007/test s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME/testBulkCommitFiles s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME/testBulkCommitFiles/out/subdir/file2 s3a://stevel-london/fork-0007/test/DELAY_LISTING_ME}}}: s3guard_metadatastore_record_writes expected:<2> but was:<3>
[INFO]

@steveloughran
Copy link
Contributor Author

One thought here: would you ever want the s3a connector to fall back to that bundled region lookup sequence?

I'm wondering in particular if it makes a difference in routing/billing on EC2 deployments?

As of Hadoop 3.3.1 if region=null, endpoint=null the Ec2 metadata is used to provide the region info (this is new).
With this patch. if endpoint = null we switch to saying region = us-east-1
will that do bad things for signing/routing HTTP connections in an EC2 deployment in different regions?
For example, if I am running in AWS ireland, will this cause requests to go to us-east-1, even if they then end up redirected back to eu-west-1.

this could mean connections are slower to set up, risk of remote data transfer and billing (though the redirections should fix that, right?), and if the rules for a deployment prevent out-of-region network traffic, will this break. I think we have hit problems related to this in the past.

put differently: is anything special happening with the default "null" endpoint and Ec2 metadata region name provision which we need to know about and support? If so, we could allow the region to be set to "" or maybe "ec2" and have that revert to the resolve chain

@steveloughran
Copy link
Contributor Author

  • maybe I could test this by setting the sysprop aws.region to something invalid. If the region resolution is going through the chain then this would get picked up ahead of ~/.config or Ec2 and so, being invalid, fail somehow. And if we weren't using that chain, all would be good.

Copy link
Contributor Author

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more to do

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, with your comments.

Add
* Ability to fall back to the region chain if you set fs.s3a.endpoint.region to ""
* Test that this happens by setting the system property aws.region to a value and
  verifying it is picked up.
* Details in troubleshooting.md, including a workaround for Hadoop-3.3.1+

This is going to surface in the wild for people doing remote IO; should add this in the
Hadoop JIRA text too.

Change-Id: I9a05d7b6ae9da98b44ceeff94582ffaed96980d3
* Fix checkstyle warnings.
* Review, move and enhance troubleshooting.
* Noticed a mention of the ~/.aws stuff in index.md; made clear it was per-host.

Change-Id: If2c8be1b0a85144242f551253586f34abb7fa26d
@steveloughran steveloughran changed the title HADOOP-17771. S3AFS creation fails without region set in ~/.aws/config. S3AFS creation fails "Unable to find a region via the region provider chain." Jun 23, 2021
@apache apache deleted a comment from hadoop-yetus Jun 23, 2021
@apache apache deleted a comment from hadoop-yetus Jun 23, 2021
@steveloughran
Copy link
Contributor Author

Latest version does let you switch to the region resolution process if you really want to; this actually lets me do a test by setting sysprops to verify that the region is picked up that way.

Also the SDK exceptions are being converted to IOEs.

Tested s3 london, -Dparallel-tests -DtestsThreadCount=7 -Dmarkers=delete -Dscale; all good.

I just realised that I'd set the fs.s3a.endpoint property though; I'll have to rerun without any endpoint or region set for the test bucket.

@apache apache deleted a comment from hadoop-yetus Jun 23, 2021
…egion provider chain."

* Fix checkstyle warnings.
* Log at warning once for default chain.
* New stack trace in the docs.

Change-Id: I64c210e576e9df4f42bc1083f2f50ccbab6b65b2
@steveloughran
Copy link
Contributor Author

Latest patch warns user on fallback, with the LogExactlyOnce class to stop it being over-noisy if someone really, really wants to use this "feature".

Also the latest stack trace is in, as well as the hadoop-3.3.1 one. I've also added the workaround info to the JIRA description as it'll probably be the first entry google will find for this

@steveloughran
Copy link
Contributor Author

Tests in progress, s3 london, endpoint and region unset, -Dparallel-tests -DtestsThreadCount=7 -Dmarkers=keep

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 30m 40s trunk passed
+1 💚 compile 0m 47s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 41s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 0m 31s trunk passed
+1 💚 mvnsite 0m 46s trunk passed
+1 💚 javadoc 0m 27s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 35s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 10s trunk passed
+1 💚 shadedclient 14m 12s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 37s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 0m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 20s the patch passed
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 javadoc 0m 17s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 9s the patch passed
+1 💚 shadedclient 14m 7s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 4s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
72m 39s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3133/4/artifact/out/Dockerfile
GITHUB PR #3133
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint
uname Linux 79ac4e35adbe 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 34525f1
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3133/4/testReport/
Max. process+thread count 546 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3133/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

…egion provider chain."

* New stack trace in the docs.

Change-Id: I4503fefc8b5af0f2033262c78839c309bf984a5a
…on auditing.

(Not quite for this PR, but it integrates well)

Change-Id: If113760e8324973c613db2743f3c9c8bbed9cc17
@steveloughran steveloughran changed the title S3AFS creation fails "Unable to find a region via the region provider chain." HADOOP-17771. S3AFS creation fails "Unable to find a region via the region provider chain." Jun 24, 2021
Change-Id: I151cd02c14525101c75fb033e48ab9711d13314a
Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 29m 43s trunk passed
+1 💚 compile 0m 46s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 39s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 0m 32s trunk passed
+1 💚 mvnsite 0m 46s trunk passed
+1 💚 javadoc 0m 28s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 12s trunk passed
+1 💚 shadedclient 14m 13s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 35s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 0m 32s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 20s the patch passed
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 9s the patch passed
+1 💚 shadedclient 14m 13s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 5s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
71m 54s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3133/7/artifact/out/Dockerfile
GITHUB PR #3133
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell markdownlint
uname Linux 7a0c55ffbf3a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 993eea4
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3133/7/testReport/
Max. process+thread count 656 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3133/7/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Jun 24, 2021
@mukund-thakur
Copy link
Contributor

Test run in progress. I have commented the region field in ~/.aws/config
I see failures like
[ERROR] testCreateWithRenewer(org.apache.hadoop.fs.s3a.auth.delegation.ITestSessionDelegationTokens) Time elapsed: 0.827 s <<< ERROR! com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region. at com.amazonaws.client.builder.AwsClientBuilder.setRegion(AwsClientBuilder.java:462) at com.amazonaws.client.builder.AwsClientBuilder.configureMutableProperties(AwsClientBuilder.java:424) at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46) at org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenBinding.maybeInitSTS(SessionTokenBinding.java:312) at org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenBinding.prepareSTSClient(SessionTokenBinding.java:341) at org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenBinding.createTokenIdentifier(SessionTokenBinding.java:362) at org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenBinding.createTokenIdentifier(SessionTokenBinding.java:63) at org.apache.hadoop.fs.s3a.auth.delegation.AbstractDelegationTokenBinding.createDelegationToken(AbstractDelegationTokenBinding.java:142) at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.lambda$createDelegationToken$0(S3ADelegationTokens.java:435) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444) at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.createDelegationToken(S3ADelegationTokens.java:433) at org.apache.hadoop.fs.s3a.auth.delegation.ITestSessionDelegationTokens.testCreateWithRenewer(ITestSessionDelegationTokens.java:230)

@apache apache deleted a comment from hadoop-yetus Jun 24, 2021
@mukund-thakur
Copy link
Contributor

After taking a glance. Not debugged in detail.
@steveloughran I think STS tests still require the region to be set in ~/.aws/config as the stack trace is different.

@steveloughran
Copy link
Contributor Author

The STS stuff has always needed it; look in testing.md and elsewhere for references to the option. Maybe the SDK provided the fallback if the default endpoint was used. That is not a regression, and all the docs discuss it.

I don't want to fix that here as its clearly not an issue in actual deployments; this patch can focus on the regression

Set up your region for testing and it will go away

 <property>
   <name>fs.s3a.assumed.role.sts.endpoint</name>
   <value>${sts.london.endpoint}</value>
 </property>
 <property>
   <name>fs.s3a.assumed.role.sts.endpoint.region</name>
   <value>${sts.london.region}</value>
 </property>

@steveloughran
Copy link
Contributor Author

@mukund-thakur see HADOOP-16565 for the STS behaviour; no regressions there & that patch provides some diagnostics.

@mukund-thakur
Copy link
Contributor

@steveloughran Thanks. Tests run fine after setting the sts region and endpoints. So we are good.

@steveloughran
Copy link
Contributor Author

thanks, merging!

@steveloughran steveloughran merged commit 5b7f68a into apache:trunk Jun 24, 2021
@steveloughran
Copy link
Contributor Author

merged into trunk; building and testing for 3.3

asfgit pushed a commit that referenced this pull request Jun 24, 2021
…egion provider chain." (#3133)

This addresses the regression in Hadoop 3.3.1 where if no S3 endpoint
is set in fs.s3a.endpoint, S3A filesystem creation may fail on
non-EC2 deployments, depending on the local host environment setup.

* If fs.s3a.endpoint is empty/null, and fs.s3a.endpoint.region
  is null, the region is set to "us-east-1".
* If fs.s3a.endpoint.region is explicitly set to "" then the client
  falls back to the SDK region resolution chain; this works on EC2
* Details in troubleshooting.md, including a workaround for Hadoop-3.3.1+
* Also contains some minor restructuring of troubleshooting.md

Contributed by Steve Loughran.

Change-Id: Ife482cff513307cd52d59eec56beac0a33e031f5
kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021
…egion provider chain." (apache#3133)


This addresses the regression in Hadoop 3.3.1 where if no S3 endpoint
is set in fs.s3a.endpoint, S3A filesystem creation may fail on
non-EC2 deployments, depending on the local host environment setup.

* If fs.s3a.endpoint is empty/null, and fs.s3a.endpoint.region
  is null, the region is set to "us-east-1".
* If fs.s3a.endpoint.region is explicitly set to "" then the client
  falls back to the SDK region resolution chain; this works on EC2
* Details in troubleshooting.md, including a workaround for Hadoop-3.3.1+
* Also contains some minor restructuring of troubleshooting.md

Contributed by Steve Loughran.
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
…on via the region provider chain." (apache#3133)

This addresses the regression in Hadoop 3.3.1 where if no S3 endpoint
is set in fs.s3a.endpoint, S3A filesystem creation may fail on
non-EC2 deployments, depending on the local host environment setup.

* If fs.s3a.endpoint is empty/null, and fs.s3a.endpoint.region
  is null, the region is set to "us-east-1".
* If fs.s3a.endpoint.region is explicitly set to "" then the client
  falls back to the SDK region resolution chain; this works on EC2
* Details in troubleshooting.md, including a workaround for Hadoop-3.3.1+
* Also contains some minor restructuring of troubleshooting.md
* uses pre-Auditing LogExactlyOnce import, so doesn't depend on that patch.

Contributed by Steve Loughran.

This is a critical follow on patch to
CDPD-26441. HADOOP-17705. S3A to add Config to set AWS region (apache#3020)

Both patches must be included

Change-Id: Icca928e1752423d68591508c360ff6434997fb64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs/s3 changes related to hadoop-aws; submitter must declare test endpoint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants