-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19221. S3A: Unable to recover from failure of multipart block upload attempt #6938
HADOOP-19221. S3A: Unable to recover from failure of multipart block upload attempt #6938
Conversation
bf5d5ec
to
a93afe6
Compare
a93afe6
to
9b942c7
Compare
I believe I have a way to test this by injecting 500 exceptions with a custom execution interceptor added to the audit chain |
76deb75
to
65fd797
Compare
tested s3 london with prefetch failures, timing related (12 too big...)
|
2790d56
to
b95ee7c
Compare
s3 london with: -Dparallel-tests -DtestsThreadCount=8 -Dscale This is ready to be reviewed. @mukund-thakur, @HarshitGupta11 and @shameersss1 could you all look at this? |
b187942
to
1fb04e9
Compare
*/ | ||
public final class ByteBufferInputStream extends InputStream { | ||
private static final Logger LOG = | ||
LoggerFactory.getLogger(DataBlocks.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoudn't this be ByteBufferInputStream.class
?
} catch (ExecutionException ee) { | ||
//there is no way of recovering so abort | ||
//cancel all partUploads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we cancelling all the uploads here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at this. may need some more review to be confident we are doing abort here properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..done a lot more work into aborting.
*/ | ||
public class AWSStatus500Exception extends AWSServiceIOException { | ||
public AWSStatus500Exception(String operation, | ||
AwsServiceException cause) { | ||
super(operation, cause); | ||
} | ||
|
||
@Override | ||
public boolean retryable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this make all 500 retriable ? I mean if we S3 throws exception like 500 S3 Server Internal error. Do we need to retry from S3A client as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the latest change..I've made it an option
// there is specific handling for some 5XX codes (501, 503); | ||
// this is for everything else | ||
policyMap.put(AWSStatus500Exception.class, fail); | ||
policyMap.put(AWSStatus500Exception.class, retryAwsClientExceptions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to selectively retry 500 exception? Say only when the cause is "Your socket connection......."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the full comment below. along with that I really don't like looking in error strings, way too brittle for production code. Even in tests I like to share the text across production and test classes as constants.
(yes, I know about org.apache.hadoop.fs.s3a.impl.ErrorTranslation ....doesn't mean I like it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes looks mostly good to me. I have left some minor comments and questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Production code looks good overall. Have to look at the tests.
@Override | ||
protected ByteBufferInputStream createNewStream() { | ||
// set the buffer up from reading from the beginning | ||
blockBuffer.limit(initialPosition); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why setting the limit is important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to start reading from the initial position every time the stream is opened.
Retrying _should_ make it go away. | ||
|
||
The 500 error is considered retryable by the AWS SDK, which will have already | ||
tried it `fs.s3a.attempts.maximum` times before reaching the S3A client -which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: retried?
@shameersss1 We have massively cut back on the number of retries which take place in the V2 SDK compared to V1; even though we have discussed in the past turning it off completely and handling it all ourselves. However, that would break things the transfer manager does in separate threads. The thing is, I do not know how often we see 500 errors against AWS S3 stores (rather than third party ones with unrecoverable issues) -and now we have seen them I don't know what the right policy should be. The only documentation on what to do seems more focused on 503s, and doesn't provide any hints about why a 500 could happen or what to do other than "keep trying maybe it'll go away": https://repost.aws/knowledge-center/http-5xx-errors-s3 . I do suspect it is very rare -otherwise the AWS team might have noticed their lack of resilience here, and we would've found it during our own testing. Any 500 error at any point other than multipart uploads probably gets recovered from nicely so that could've been a background noise of these which we have never noticed before. s3a FS stats will now track these, which may be informative. I don't want to introduce another configuration switch if possible because that at more to documentation testing maintenance et cetera. One thing I was considering is should we treat this exactly the same as a throttling exception which has its own configuration settings for retries? Anyway, if you could talk to your colleagues and make some suggestions based on real knowledge of what can happen that would be really nice. Note that we are treating 500 as idempotent, the way we do with all the other failures even though from a distributed computing purism perspective it is not in fact true. Not looked at the other comments yet; will do later. Based on a code walk-through with Mukud, Harshit and Saikat, I've realised we should make absolutely sure that the stream providing a subset of file fails immediately if the read() goes past the allocated space. With tests, obviously. |
@shameersss1 here is what I propose: add a boolean config option fs.s3a.retry.all.http.errors retry on all "maybe unrecoverable" http errors; default is false. I did think about a comma separated list "500, 4xx, 510" but decided it was overcomplex |
FYI, just reviewing the block output stream and simplifying it...no need to make single block PUT async and it simplifies that code once there's no need to worry about interruptions and cancelling. Also going up the execution chain to make sure CancellationException is processed in the output stream |
Performing the PUT in the same thread as S3ABlockOutputStream.close() simplifies it a lot (no cancel/interrupt). +close() maps CancellationException to InterruptedIOException +updated javadocs to emphasise CancellationException can be raised +tweaked some problematic javadocs Change-Id: I266697cd722fcfab0f9a98450d84abcdd38cb883
Change-Id: I42eabb4e9348c6c2c88a284a33b6230947914695
+had to do a merge commit as all PRs whose dependencies on h-thirdparty are 1.3.0-SNAPSHOT will break |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
this is funny: placing the simple PUT in the same thread as close() breaks Why so? that test looks for at least one thread called s3a-transfer then asserts that after the thread timeout that count goes to zero. It is meant to assert that after renames the pool is drained but we've made two changes this year to reduce the #of threads
As a result: no threads to assert on. I'm fixing it by shrinking the size of multipart uploads to their minimum -this seems to work, though if problems surface in future we should look at the test and decide whether or not it is obsolete -or whether we could redesign the tests to include more parallelized operations (tree renames?) |
PUT changes caused latent regression in this test to surface Also: more tuning of S3ABlockOutputStream, including comments, logging and @RetryPolicy tags Change-Id: I7c9920a3bb835d6993b5e1f84faf1286e2194fd0
* General review of S3ABlockOutputStream, inc javadocs and Retries attributes * S3ADataBlocks also has its methods' exceptions reviewed and javadocs updated * Default value of fs.s3a.retry.http.5xx.errors is true * Troubleshooting and third-party docs updated Change-Id: I764943cdc0c867875be807ee6f4bd27600aae275
today's changes.
The javadocs are mainly @throws calls as I went down them all to see what could be thrown and why, just to reassure myself there's no other causes of cancellation. The change of |
test-wise, something transient
this says the call wasn't in a span, but it is unless the span source is null -and the span source is set to the FS in setup(), and the fs sets its audit span in initialize to a real or stub audit manager. so I have no idea how this can be reached. Seen something like this before so I consider it completely unrelated. |
💔 -1 overall
This message was automatically generated. |
Mockito test failed because mockito tests are so brittle. Change-Id: Ia5e8a4fdb74b08a04af58f3b5c868392758cf9f3
Change-Id: I74994e4c41205db836df18ff53777219467a98cd
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
Change-Id: Ib993974ce16df24a2da47223c3cc2e35336176a1
Changing the default retry policy failed a test * implemented variants of the test case for policies with/without 500 retry. * little bit of cleanup on an old test suite. Change-Id: Iefce2e9c7623f94644f1ad3f07bb581d4e707765
tested s3 london
|
🎊 +1 overall
This message was automatically generated. |
PR all green; got Ahmar's approval. merging to trunk and 3.4, but not 3.4.1 |
…upload attempt (apache#6938) This is a major change which handles 400 error responses when uploading large files from memory heap/buffer (or staging committer) and the remote S3 store returns a 500 response from a upload of a block in a multipart upload. The SDK's own streaming code seems unable to fully replay the upload; at attempts to but then blocks and the S3 store returns a 400 response "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: S3, Status Code: 400...)" There is an option to control whether or not the S3A client itself attempts to retry on a 50x error other than 503 throttling events (which are independently processed as before) Option: fs.s3a.retry.http.5xx.errors Default: true 500 errors are very rare from standard AWS S3, which has a five nines SLA. It may be more common against S3 Express which has lower guarantees. Third party stores have unknown guarantees, and the exception may indicate a bad server configuration. Consider setting fs.s3a.retry.http.5xx.errors to false when working with such stores. Signification Code changes: There is now a custom set of implementations of software.amazon.awssdk.http.ContentStreamProvidercontent in the class org.apache.hadoop.fs.s3a.impl.UploadContentProviders. These: * Restart on failures * Do not copy buffers/byte buffers into new private byte arrays, so avoid exacerbating memory problems.. There new IOStatistics for specific http error codes -these are collected even when all recovery is performed within the SDK. S3ABlockOutputStream has major changes, including handling of Thread.interrupt() on the main thread, which now triggers and briefly awaits cancellation of any ongoing uploads. If the writing thread is interrupted in close(), it is mapped to an InterruptedIOException. Applications like Hive and Spark must catch these after cancelling a worker thread. Contributed by Steve Loughran
…upload attempt (#6938) (#7044) This is a major change which handles 400 error responses when uploading large files from memory heap/buffer (or staging committer) and the remote S3 store returns a 500 response from a upload of a block in a multipart upload. The SDK's own streaming code seems unable to fully replay the upload; at attempts to but then blocks and the S3 store returns a 400 response "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: S3, Status Code: 400...)" There is an option to control whether or not the S3A client itself attempts to retry on a 50x error other than 503 throttling events (which are independently processed as before) Option: fs.s3a.retry.http.5xx.errors Default: true 500 errors are very rare from standard AWS S3, which has a five nines SLA. It may be more common against S3 Express which has lower guarantees. Third party stores have unknown guarantees, and the exception may indicate a bad server configuration. Consider setting fs.s3a.retry.http.5xx.errors to false when working with such stores. Signification Code changes: There is now a custom set of implementations of software.amazon.awssdk.http.ContentStreamProvidercontent in the class org.apache.hadoop.fs.s3a.impl.UploadContentProviders. These: * Restart on failures * Do not copy buffers/byte buffers into new private byte arrays, so avoid exacerbating memory problems.. There new IOStatistics for specific http error codes -these are collected even when all recovery is performed within the SDK. S3ABlockOutputStream has major changes, including handling of Thread.interrupt() on the main thread, which now triggers and briefly awaits cancellation of any ongoing uploads. If the writing thread is interrupted in close(), it is mapped to an InterruptedIOException. Applications like Hive and Spark must catch these after cancelling a worker thread. Contributed by Steve Loughran
…upload attempt (apache#6938) (apache#7044) This is a major change which handles 400 error responses when uploading large files from memory heap/buffer (or staging committer) and the remote S3 store returns a 500 response from a upload of a block in a multipart upload. The SDK's own streaming code seems unable to fully replay the upload; at attempts to but then blocks and the S3 store returns a 400 response "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: S3, Status Code: 400...)" There is an option to control whether or not the S3A client itself attempts to retry on a 50x error other than 503 throttling events (which are independently processed as before) Option: fs.s3a.retry.http.5xx.errors Default: true 500 errors are very rare from standard AWS S3, which has a five nines SLA. It may be more common against S3 Express which has lower guarantees. Third party stores have unknown guarantees, and the exception may indicate a bad server configuration. Consider setting fs.s3a.retry.http.5xx.errors to false when working with such stores. Signification Code changes: There is now a custom set of implementations of software.amazon.awssdk.http.ContentStreamProvidercontent in the class org.apache.hadoop.fs.s3a.impl.UploadContentProviders. These: * Restart on failures * Do not copy buffers/byte buffers into new private byte arrays, so avoid exacerbating memory problems.. There new IOStatistics for specific http error codes -these are collected even when all recovery is performed within the SDK. S3ABlockOutputStream has major changes, including handling of Thread.interrupt() on the main thread, which now triggers and briefly awaits cancellation of any ongoing uploads. If the writing thread is interrupted in close(), it is mapped to an InterruptedIOException. Applications like Hive and Spark must catch these after cancelling a worker thread. Contributed by Steve Loughran
…upload attempt (#6938) (#7044) (#7094) This is a major change which handles 400 error responses when uploading large files from memory heap/buffer (or staging committer) and the remote S3 store returns a 500 response from a upload of a block in a multipart upload. The SDK's own streaming code seems unable to fully replay the upload; at attempts to but then blocks and the S3 store returns a 400 response "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: S3, Status Code: 400...)" There is an option to control whether or not the S3A client itself attempts to retry on a 50x error other than 503 throttling events (which are independently processed as before) Option: fs.s3a.retry.http.5xx.errors Default: true 500 errors are very rare from standard AWS S3, which has a five nines SLA. It may be more common against S3 Express which has lower guarantees. Third party stores have unknown guarantees, and the exception may indicate a bad server configuration. Consider setting fs.s3a.retry.http.5xx.errors to false when working with such stores. Signification Code changes: There is now a custom set of implementations of software.amazon.awssdk.http.ContentStreamProvidercontent in the class org.apache.hadoop.fs.s3a.impl.UploadContentProviders. These: * Restart on failures * Do not copy buffers/byte buffers into new private byte arrays, so avoid exacerbating memory problems.. There new IOStatistics for specific http error codes -these are collected even when all recovery is performed within the SDK. S3ABlockOutputStream has major changes, including handling of Thread.interrupt() on the main thread, which now triggers and briefly awaits cancellation of any ongoing uploads. If the writing thread is interrupted in close(), it is mapped to an InterruptedIOException. Applications like Hive and Spark must catch these after cancelling a worker thread. Contributed by Steve Loughran
HADOOP-19221
Adds custom set of content providers in UploadContentProviders which
org.apache.hadoop.fs.store.ByteBufferInputStream has been pulled out of org.apache.hadoop.fs.store.DataBlocks to assist.
Description of PR
How was this patch tested?
Fault injection through AWS SDK extension point which changes the status from 200 to 400 after the targeted operation completes. This puts the SDK into retry/recovery mode.
Some new unit tests
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?