Skip to content

HADOOP-16823. Manage S3 Throttling exclusively in S3A client. #1814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

steveloughran
Copy link
Contributor

Currently AWS S3 throttling is initially handled in the AWS SDK, only reaching the S3 client code after it has given up.

This means we don't always directly observe when throttling is taking place.

Proposed:

  • disable throttling retries in the AWS client Library
  • add a quantile for the S3 throttle events, as DDB has
  • isolate counters of s3 and DDB throttle events to classify issues better

Because we are taking over the AWS retries, we will need to expand the initial delay en retries and the number of retries we should support before giving up.

Also: should we log throttling events? It could be useful but there is a risk of logs overloading especially if many threads in the same process were triggering the problem.

Change-Id: I386928cd478a6a9fbb91f15b9185a1ea91878680

@steveloughran steveloughran added enhancement fs/s3 changes related to hadoop-aws; submitter must declare test endpoint labels Jan 21, 2020
@steveloughran
Copy link
Contributor Author

not yet tested against anything

@steveloughran
Copy link
Contributor Author

might make sense to add throttling as part of the fault injection to the unreliable S3 client; we could test the counters and aid production testing

@steveloughran steveloughran added this to the hadoop-3.3.0 milestone Jan 22, 2020
@steveloughran
Copy link
Contributor Author

style

./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Statistic.java:233:  STORE_IO_THROTTLE_RATE("store_io_throttle_rate", "Rate of S3 request throttling"),: Line is longer than 80 characters (found 84). [LineLength]

Currently AWS S3 throttling is initially handled in the AWS SDK, only reaching the S3 client code after it has given up.

This means we don't always directly observe when throttling is taking place.

Proposed:

* disable throttling retries in the AWS client Library
* add a quantile for the S3 throttle events, as DDB has
* isolate counters of s3 and DDB throttle events to classify issues better

Because we are taking over the AWS retries, we will need to expand the initial delay en retries and the number of retries we should support before giving up.

Also: should we log throttling events? It could be useful but there is a risk of logs overloading especially if many threads in the same process were triggering the problem.

Change-Id: I386928cd478a6a9fbb91f15b9185a1ea91878680
Proposed: log at debug.
fix checkstyle

Change-Id: I19f3848b298a8656ee5f986a2ba1cde50a106814
@bgaborg bgaborg self-requested a review January 29, 2020 13:57
@steveloughran steveloughran force-pushed the s3/HADOOP-16823-throttling branch from bfecd39 to 22708cd Compare January 29, 2020 16:21
@steveloughran
Copy link
Contributor Author

Did manage to overload one of the DDB scale tests here -those throttling values are clearly too low to recover from a big mismatch of DDB capacity and load. We are going to need some bigger values

ERROR] test_060_list(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)  Time elapsed: 151.579 s  <<< ERROR!
com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: TA7TTFE2BIFUHAH175T0GTEU6VVV4KQNSO5AEMVJF66Q9ASUAAJG)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:4279)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:4246)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeScan(AmazonDynamoDBClient.java:3040)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.scan(AmazonDynamoDBClient.java:3006)
	at com.amazonaws.services.dynamodbv2.document.internal.ScanCollection.firstPage(ScanCollection.java:53)
	at com.amazonaws.services.dynamodbv2.document.internal.PageIterator.next(PageIterator.java:45)
	at com.amazonaws.services.dynamodbv2.document.internal.IteratorSupport.nextResource(IteratorSupport.java:87)
	at com.amazonaws.services.dynamodbv2.document.internal.IteratorSupport.hasNext(IteratorSupport.java:55)
	at org.apache.hadoop.fs.s3a.s3guard.S3GuardTableAccess$DDBPathMetadataIterator.hasNext(S3GuardTableAccess.java:195)
	at java.util.Iterator.forEachRemaining(Iterator.java:115)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.teardown(ITestDynamoDBMetadataStoreScale.java:188)
	at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

@steveloughran
Copy link
Contributor Author

...oh, and note that failure was in the SCAN operation. I'm not sure we have enough wrapping there

@steveloughran
Copy link
Contributor Author

With retries on the scans in the test teardown (and dump/purge DDB), getting a failure in queryVersionMarker()


[ERROR] test_070_putDirMarker(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)  Time elapsed: 308.882 s  <<< ERROR!
org.apache.hadoop.fs.s3a.AWSServiceThrottledException: getVersionMarkerItem on ../VERSION: com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: EGKO5AOIAOMKVQE8CLR1IGP8GJVV4KQNSO5AEMVJF66Q9ASUAAJG): The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: EGKO5AOIAOMKVQE8CLR1IGP8GJVV4KQNSO5AEMVJF66Q9ASUAAJG)
	at org.apache.hadoop.fs.s3a.S3AUtils.translateDynamoDBException(S3AUtils.java:424)
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:209)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:112)
	at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:315)
	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:407)
	at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:311)
	at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:286)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStoreTableManager.queryVersionMarker(DynamoDBMetadataStoreTableManager.java:662)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStoreTableManager.getVersionMarkerItem(DynamoDBMetadataStoreTableManager.java:618)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStoreTableManager.initTable(DynamoDBMetadataStoreTableManager.java:199)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.initialize(DynamoDBMetadataStore.java:529)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.createMetadataStore(ITestDynamoDBMetadataStoreScale.java:152)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.setup(ITestDynamoDBMetadataStoreScale.java:162)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: EGKO5AOIAOMKVQE8CLR1IGP8GJVV4KQNSO5AEMVJF66Q9ASUAAJG)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:4279)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:4246)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeGetItem(AmazonDynamoDBClient.java:2054)
	at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.getItem(AmazonDynamoDBClient.java:2020)
	at com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.doLoadItem(GetItemImpl.java:77)
	at com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.getItemOutcome(GetItemImpl.java:46)
	at com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.getItem(GetItemImpl.java:88)
	at com.amazonaws.services.dynamodbv2.document.Table.getItem(Table.java:597)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStoreTableManager.lambda$queryVersionMarker$2(DynamoDBMetadataStoreTableManager.java:664)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:110)
	... 24 more

What is funny is that other tests are failing because they aren't detecting throttling:


[ERROR] test_100_forgetMetadata(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)  Time elapsed: 2.855 s  <<< FAILURE!
java.lang.AssertionError: No throttling detected in Tracker with read throttle events = 0; write events = 0; batch throttles = 0 against DynamoDBMetadataStore{region=eu-west-1, tableName=s3guard-metadata, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata}
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.apache.hadoop.fs.s3a.s3guard.ThrottleTracker.assertThrottlingDetected(ThrottleTracker.java:97)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.execute(ITestDynamoDBMetadataStoreScale.java:552)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.test_100_forgetMetadata(ITestDynamoDBMetadataStoreScale.java:452)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

Turning off throttling in the AWS client causes problems for the DDBMetastore;
including showing where tests were making non-retrying operations against
the table.

Mostly addressed though ITestDynamoDBMetadataStoreScale is still petulant.
Either it takes too long to finish or it doesn't throttle. Oh, and
lag means that while a test may fail because throttling wasn't raised,
the next IO may fail.

Change-Id: I37bbcb67023f4cb3ebdcba978602be58099ad306
@apache apache deleted a comment from hadoop-yetus Jan 29, 2020
@steveloughran
Copy link
Contributor Author

My "little" fix to turn off retries in the AWS client causes issues in the DDB clients where there's a significant mismatch between prepaid IO and load; ITestDynamoDBMetadataStoreScale is the example of this.

Looking at the AWS metrics, part of the fun is that the way bursty traffic is handled, you may get your capacity at the time of the initial load, but get blocked after. That is: the throttling may not happen under load, but during the next time a low-load API call is made.

Also, S3GuardTableAccess isn't retrying, and some code in tests and the purge/dump table entry points go on to fail when throttling happens when iterating through scans. Fix: you can ask a DDBMetastore to wrap your scan with one bonded to its retry and metrics...plus use of this where appropriate.

ITestDynamoDBMetadataStoreScale is really slow; either the changes make it worse, or its always been really slow and we haven't noticed as it was happening during the (slow) parallel test runs. Proposed: we review it, look at what we want to show and then see if we can make things fail faster

Latest Patch makes the SDK throttling disablement exclusive to S3, fixed up DDB clients to retry better and tries to make a better case for that ITestDynamoDBMetadataStoreScale suite.

I think I'm going to tune those tests to always downgrade if none is detected.

[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale
[ERROR] Tests run: 11, Failures: 5, Errors: 1, Skipped: 0, Time elapsed: 190.404 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale
[ERROR] test_030_BatchedWrite(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)  Time elapsed: 10.259 s  <<< FAILURE!
java.lang.AssertionError: No throttling detected in Tracker with read throttle events = 0; write throttles = 0; batch throttles = 0; scan throttles = 0 against DynamoDBMetadataStore{region=eu-west-1, tableName=s3guard-metadata, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata}
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.execute(ITestDynamoDBMetadataStoreScale.java:578)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.test_030_BatchedWrite(ITestDynamoDBMetadataStoreScale.java:285)

[ERROR] test_040_get(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)  Time elapsed: 4.15 s  <<< FAILURE!
java.lang.AssertionError: No throttling detected in Tracker with read throttle events = 0; write throttles = 0; batch throttles = 0; scan throttles = 0 against DynamoDBMetadataStore{region=eu-west-1, tableName=s3guard-metadata, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata}
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.execute(ITestDynamoDBMetadataStoreScale.java:578)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.test_040_get(ITestDynamoDBMetadataStoreScale.java:341)

[ERROR] test_050_getVersionMarkerItem(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)  Time elapsed: 3.311 s  <<< FAILURE!
java.lang.AssertionError: No throttling detected in Tracker with read throttle events = 0; write throttles = 0; batch throttles = 0; scan throttles = 0 against DynamoDBMetadataStore{region=eu-west-1, tableName=s3guard-metadata, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata}
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.execute(ITestDynamoDBMetadataStoreScale.java:578)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.test_050_getVersionMarkerItem(ITestDynamoDBMetadataStoreScale.java:356)

[ERROR] test_070_putDirMarker(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)  Time elapsed: 2.486 s  <<< ERROR!
org.apache.hadoop.fs.s3a.AWSServiceThrottledException: getVersionMarkerItem on ../VERSION: com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: 52JGLGQ7B8SLQD3BDQCI9U6NH3VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: 52JGLGQ7B8SLQD3BDQCI9U6NH3VV4KQNSO5AEMVJF66Q9ASUAAJG)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.createMetadataStore(ITestDynamoDBMetadataStoreScale.java:153)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.setup(ITestDynamoDBMetadataStoreScale.java:163)
Caused by: com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: 52JGLGQ7B8SLQD3BDQCI9U6NH3VV4KQNSO5AEMVJF66Q9ASUAAJG)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.createMetadataStore(ITestDynamoDBMetadataStoreScale.java:153)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.setup(ITestDynamoDBMetadataStoreScale.java:163)

[ERROR] test_090_delete(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)  Time elapsed: 2.804 s  <<< FAILURE!
java.lang.AssertionError: No throttling detected in Tracker with read throttle events = 0; write throttles = 0; batch throttles = 0; scan throttles = 0 against DynamoDBMetadataStore{region=eu-west-1, tableName=s3guard-metadata, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata}
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.execute(ITestDynamoDBMetadataStoreScale.java:578)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.test_090_delete(ITestDynamoDBMetadataStoreScale.java:462)

[ERROR] test_100_forgetMetadata(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale)  Time elapsed: 2.278 s  <<< FAILURE!
java.lang.AssertionError: No throttling detected in Tracker with read throttle events = 0; write throttles = 0; batch throttles = 0; scan throttles = 0 against DynamoDBMetadataStore{region=eu-west-1, tableName=s3guard-metadata, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata}
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.execute(ITestDynamoDBMetadataStoreScale.java:578)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.test_100_forgetMetadata(ITestDynamoDBMetadataStoreScale.java:478)

For the setup failure (here in test_070_putDirMarker); not sure. We either skip the test or retry.

It's always surfacing in test_070; test_060 tests list scale. Looking at that code, I think the retry logic is too coarse -it retries the entire list, when we may want to just retry on the hasnext/next calls. That is: push it down. This will avoid so much load on any retry.

* Split out where/how we retry listchildren
* trying to speed up the ddb scale tests

(though the latest change there triggers an NPE...)

For anyone curious why tests take so long -it's probably
set up of the per-test-case FS instance, because that has full retry,
and once one test has throttled, that spin/wait goes on until DDB
is letting the client at it.

Which is a PITA but it does at least mean that "usually" each test case
is in a recovered state. Do we care? Should we just run them back to
back and be happy overloading things? I think so

Change-Id: Ib35d450449fffaa2379d62ca12180eaa70c38584
@steveloughran
Copy link
Contributor Author

Quick followup: noticed there's no javadocs here. I know not all the existing constants have them, but they should.

Not worth fixing now -but for future patches, can you add a javadoc with the {@value} reference -so that anyone looking at the javadocs gets to see what the constant does and what it is. thanks

@apache apache deleted a comment from hadoop-yetus Jan 31, 2020
@apache apache deleted a comment from hadoop-yetus Jan 31, 2020
@apache apache deleted a comment from hadoop-yetus Jan 31, 2020
@steveloughran
Copy link
Contributor Author

oops, accidentally closed this. Will resubmit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement fs/s3 changes related to hadoop-aws; submitter must declare test endpoint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant