-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-16823. Large DeleteObject requests are their own Thundering Herd #1826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-16823. Large DeleteObject requests are their own Thundering Herd #1826
Conversation
810bad4
to
74553b0
Compare
464d789
to
dacc401
Compare
|
BTW, latest patch adds an experimental "optimize directory markers" switch which tells the S3a client to be less strict about creating and deleting directory markers. I'm explicitly vague about what that means, but currently in file creation it only ever looks one level up to delete markers. There's trouble there if applications don't call mkdir() on a path before creating the file, but otherwise it avoids the tombstones and scale problems on deep trees |
Currently AWS S3 throttling is initially handled in the AWS SDK, only reaching the S3 client code after it has given up. This means we don't always directly observe when throttling is taking place. Proposed: * disable throttling retries in the AWS client Library * add a quantile for the S3 throttle events, as DDB has * isolate counters of s3 and DDB throttle events to classify issues better Because we are taking over the AWS retries, we will need to expand the initial delay en retries and the number of retries we should support before giving up. Also: should we log throttling events? It could be useful but there is a risk of logs overloading especially if many threads in the same process were triggering the problem. Change-Id: I386928cd478a6a9fbb91f15b9185a1ea91878680 Proposed: log at debug.
fix checkstyle Change-Id: I19f3848b298a8656ee5f986a2ba1cde50a106814
Turning off throttling in the AWS client causes problems for the DDBMetastore; including showing where tests were making non-retrying operations against the table. Mostly addressed though ITestDynamoDBMetadataStoreScale is still petulant. Either it takes too long to finish or it doesn't throttle. Oh, and lag means that while a test may fail because throttling wasn't raised, the next IO may fail. Change-Id: I37bbcb67023f4cb3ebdcba978602be58099ad306
* Split out where/how we retry listchildren * trying to speed up the ddb scale tests (though the latest change there triggers an NPE...) For anyone curious why tests take so long -it's probably set up of the per-test-case FS instance, because that has full retry, and once one test has throttled, that spin/wait goes on until DDB is letting the client at it. Which is a PITA but it does at least mean that "usually" each test case is in a recovered state. Do we care? Should we just run them back to back and be happy overloading things? I think so Change-Id: Ib35d450449fffaa2379d62ca12180eaa70c38584
- Moving RetryingCollection to toplevel; - DDBMS.listChildren() unwraps IOEs raised in iterator. - throttling scale test happier if throttling doesn't surface in a test run as it may mean that the problem will surface later Change-Id: Ibf55e6ab257269b55230eedacae6a17586d91211
Implies that from the (remote) test machine multiple smaller bulk deletes are handled better than few large ones from a retry perspective. Also imply: we should make sure that the backoff strategy we use in our code doesn't back off over-aggressively Proposed: retryUpToMaximumCountWithProportionalSleep rather than exponential for throttling oh, and the throttle detection code doesn't seem to be updating counters here... Change-Id: I163d128aa5ad5c203ade66bd4f049d3357d6a9d4
-trying to set up an explicit callback on retries in bulk deletes Change-Id: I456680bbbedf3f135508ae3960e83eb1baefbfc6
* pulling out retry logic into its own class; only using this in delete() * tests are parameterized configured page size coming in as 0; no idea why not. To debug Change-Id: I7e45b897c0a8d09167e6e6148a8f3930f31ec5b0
This adds an option "fs.s3a.experimental.optimized.directory.operations" which says "optimize directory IO" without being explicit about what it does. In this release It only looks for and deletes parent dir entry when creating a file (we still do for mkdir though) This is dangerous as it goes against what is the fs spec for create (goes with createNonRecursive though :). If you create a file two levels under an empty dir, that empty dir marker stays there. but consider: if you create a file two levels under a file, s3a is happy. and nobody has noticed. Also * directory cleanup/parent marker recreation is done async * page size set to 250; seems to balance out better in the load tests. * HADOOP-16613. dir marker contentType = application/x-directory Change-Id: Id88a198f61beb3719aa4202d26f3634e5e9cc194
Change-Id: I7bb9a4a9cc0b5e1ee7a54be7c5f463621ca66bc1
Change-Id: I4df1d47c0865604bbb17083b08cd5a3bc4e1d9f4
These tests can fail even before this patch went in, but it was as I was working on this where some of the problems were happening often enough that I could track down problems. Key: testDeleteTable() was not deleting a test table, it was deleting whichever table the bucket was bonded to, so potentially interfering with every other test. The stack traces were actually appearing in the next test which was run, testAncestorOverwriteConflict(), which would spin for 30-60s in setup before failing. Change-Id: I5e942d3854a5e1e496405c5be620768d2f81a83a
6b0a67c
to
10f459e
Compare
experimental directory marker optimization feature removed. It was really broken and its very presence would only encourage people to turn it on, or at least, start demanding it was production ready within a short period of time..and be very disappointed when that couldn't happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1, the only note I added is that I would add the EXPERIMENTAL_AWS_INTERNAL_THROTTLING
default value constant.
*/ | ||
@InterfaceStability.Unstable | ||
public static final String EXPERIMENTAL_AWS_INTERNAL_THROTTLING = | ||
"fs.s3a.experimental.aws.internal.throttling"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the default value for this defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its true, but yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating the value, also changing the name to ""fs.s3a.experimental.aws.s3.throttling" to make clear its s3 only
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
Outdated
Show resolved
Hide resolved
(also this is not a bug imho, more like an improvement) |
the new load test was not picking up throttle events in the retry handler. This is because the failure we are seeing is actually the XML parser error we've seen before when an opening connection is broken and the AWS SDK client's XML parser simply sees and report a failure of XML parsing, rather than change in the network or remote system. We have assumed until now there was a sign of network issues. The fact that is happening consistently when performing bulk delete operations makes me suspect that it is actually the S3 front end rejecting the caller. We are retrying on it, but treating it as a symptom of throttling and so updating the relevant counters. Change-Id: I5b6907ddd7d3eaec65d12064b10c89d953d85e46
Change-Id: I55a7d6d77accdf7393e147db2866300495d11f5b
Gabor, thanks for the review. yeah, you are right. Improvement. Before I merge, do you want to look at XML parser errors are being treated as retry failures, as that is what I'm seeing during the load tests (i.e. not 503/slow down). https://issues.apache.org/jira/browse/HADOOP-13811 shows the history there (and yes, my test bucket is versioned for 24h). |
(retested against s3 ireland, got the failure in testListingDeleteauth=true which is from my auth mode patch against versioned buckets -will do a quick followup for that patch) -Dparallel-tests -DtestsThreadCount=8 -Ds3guard -Ddynamo -Dauth |
style
|
Did checkstyle changes and a diff with trunk to (a) reduce the diff and (b) see what I needed to improve with javadocs; mainly the RetryingCollection. I got a failure on a -Dscale auth run
Now, I've been playing with older branch-2 versions recently, and could blame that -but "bulk" and "delete" describe exactly what I was working on in this patch. It wasn't, but while working on this tests, with better renames, I managed to create a deadlock in the new code
Surfaced in ITestS3ADeleteManyFiles during parallel file creation. Actions
Makes me think we should do more parallel IO tests within the same process. |
* remove the async stuff from the end of rename() * keep dir marker delete operations in finishedWrite() async, but use the unbounded thread pool. * Cleanup + enhancement of ITestS3ADeleteManyFiles so that it tests src and dest paths more rigorously, * and sets a page size of 50 for better coverage of the paged rename sequence. Change-Id: I334d70cc52c73bd926ccd1414e11a0ba740d9b89
Change-Id: I5fe8caab3b490904ef50522ca1dc0c7888fc79dc
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests were running without errors against ireland.
+1
ooh, thanks for this! |
Change-Id: I833700b25f4c8cfb16a89f843d441edfbf440e59
merged |
💔 -1 overall
This message was automatically generated. |
Currently AWS S3 throttling is initially handled in the AWS SDK, only reaching the S3 client code after it has given up.
This means we don't always directly observe when throttling is taking place.
(reinstatment of #1814 which was accidentally closed)