HADOOP-17812. NPE in S3AInputStream read() after failure to reconnect to store #3222

wbo4958 · 2021-07-22T01:06:48Z

When IOException happens during "wrappedStream.read", onReadFailure->reopen
will be called and reopen will try to re-open "wrappedStream", but what if
exception happens during getting S3Object, then "wrappedStream" will be null,
finally, the "retry" may re-execute the read block and cause the NPE.

See the issue https://issues.apache.org/jira/browse/HADOOP-17812

wbo4958 · 2021-07-22T01:16:19Z

@steveloughran Could you help to take a look at this PR? Thx a lot

When IOException happens during "wrappedStream.read", onReadFailure->reopen will be called and reopen will try to re-open "wrappedStream", but what if exception happens during getting S3Object, then "wrappedStream" will be null, finally, the "retry" may re-execute the read block and cause the NPE

steveloughran

First, thank you for not only finding this bug but providing a PR. Always Appreciated.

It took me a while to work out what's gone wrong here. Presumably the connection
to your S3 store was down so long that the re-trying reopen() code failed; this
got as far as the wrapping retry() in read(), which when NPEs.

Although this PR moves the failure from an NPE to IOE, it doesn't do anything
about attempt to recover from the failure, which the retry() code is meant to do

I think the input codestream code needs to simply try and reconnect when
the wrappedStream is null. Either this succeeds (and the subsequent read works) or it fails,
an IOE is raised, and the S3ARetryPolicy decides what to do:

if (wrappedStream == null) {
  onReadFailure(e, 1, false);
} 
read(...)

other points

Testing

See Testing S3A policy

you have to declare the S3 endpoint you ran the entire hadoop- suite against. If you've an internal endpoint, that's fine -just need that test coverage.
Yetus/Jenkins can't do it.

We would also like to have a test for the new problem; TestS3AInputStreamRetry is where it should go

`MultiFileCloudParquetPartitionReader`

Can you point me at at MultiFileCloudParquetPartitionReader, I would really like to see it

If you are targeting Hadoop 3.3.1+, there are likely opportunities for speedups and statisics collection/reporting.

I've been playihg a bit myself there to see what could be done

wbo4958 · 2021-07-23T02:20:10Z

Ths @steveloughran for the reviewing, for the code suggestion you gave is the same as the PR

          try {
            // When exception happens before re-setting wrappedStream in "reopen" called
            // by onReadFailure, then wrappedStream will be null. But the **retry** may
            // re-execute this block and cause NPE if we don't check wrappedStream
            if (wrappedStream == null) {
              throw new IOException("Null IO stream for reading "  + uri);
            }
            b = wrappedStream.read();
          } catch (EOFException e) {
            return -1;
          } catch (SocketTimeoutException e) {
            onReadFailure(e, 1, true);
            throw e;
          } catch (IOException e) {
            onReadFailure(e, 1, false);
            throw e;
          }

when wrappedStream is null, the IOException is thrown, then the catch block will call onReadFailure to retry.

steveloughran · 2021-07-23T13:32:36Z

update 2021-07-23-20:49 actually, I may be wrong here. will need to write the test to see what the outcome is. still be best to preserve whatever did fail though

when wrappedStream is null, the IOException is thrown, then the catch block will call onReadFailure to retry.

yes, but the exception raised is an IOE, not the underlying cause, so the retry logic won't examine failure, it will simply give up.

If you look at the S3ARetryPolicy you can see how no attempt is made to retry a generic IOE. Therefore there will be precisely one retry attempt (the exception handler), and if that doesn't fix it (e.g. server not yet recovered): Failure.

for the code suggestion you gave is the same as the PR

I am proposing that on the entry to method, the full attempt to reconnect is made if to the stream is null

In the test which this PR is going to need, the issue will become apparent if the simulated failure is a sequence of

succeed, returning a stream
throw SocketTimeoutException on the first read()
throw ConnectTimeoutException three times
then return a stream whose read() returns a character

With ConnectTimeoutException being raised on the reconnect, the retry will try to connect with backoff, jitter, configurable limit. Throwing a simple IOE will fail on the first retry

(test case should also setup a retry policy with a retry interval of 0ms so it doesn't trigger any delays)

+@majdyz

wbo4958 · 2021-07-26T02:00:23Z

Hi @steveloughran

I modified the unit tests which can cover the NPE described in the JIRA, and I ran the Integration tests and some tests failed. I don't know if the failed unit tests are expected. I updated the test result in https://issues.apache.org/jira/browse/HADOOP-17812?focusedCommentId=17386993&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17386993

I still think throwing exception when wrappedStream equals null may be better. If we use

         try {
            if (wrappedStream == null) {
              onReadFailure(null, 1, false);
            }
            b = wrappedStream.read();
          } catch (EOFException e) {
            return -1;
          } catch (SocketTimeoutException e) {
            onReadFailure(e, 1, true);
            throw e;
          } catch (IOException e) {
            onReadFailure(e, 1, false);
            throw e;
          }

if onReadFailure called when detecting wrappedStream==null is failed, then the onReadFailure will be called again in the block of catch (IOException e) { , My intention is to let the retry do that.

steveloughran · 2021-07-27T11:19:55Z

if onReadFailure called when detecting wrappedStream==null is failed, then the onReadFailure will be called again in the block of catch (IOException e) { , My intention is to let the retry do that.

I'm thinking:

pull the if (!wrappesStream) onReadFailure() out of try/catch, so that double check doesn't happen
the exception handling to only close the wrapped stream, not try to reopen:

if (wrappedStream == null) {
  // trigger a re-open. 
  reopen("failure recovery", getPos(), 1, false);
}
try {
     b = wrappedStream.read();
   } catch (EOFException e) {
     return -1;
   } catch (SocketTimeoutException e) {
     onReadFailure(e, 1, true);
     throw e;
   } catch (IOException e) {
     onReadFailure(e, 1, false);
     throw e;
   }

Critically, onReadFailure will stop trying to reopen the stream, instead it release (or, socket exception: force breaks) the stream. Removes throws IOE from its signature.

   
 private void onReadFailure(IOException ioe, int length, boolean forceAbort)
         {
   if (LOG.isDebugEnabled()) {
     LOG.debug("Got exception while trying to read from stream {}, " +
         "client: {} object: {}, trying to recover: ",
         uri, client, object, ioe);
   } else {
     LOG.info("Got exception while trying to read from stream {}, " +
         "client: {} object: {}, trying to recover: " + ioe,
         uri, client, object);
   }
   streamStatistics.readException();
   closeStream("failure recovery", contentRangeFinish, forceAbort);   // HERE
 }

so:

there's no attempt to reopen the stream (so cannot raise IOE any more). The exception raise in the initial read() failure is the one raised to the S3ARetryPolicy.
the new reopen (note, there's one hidden in lazySeek) does the reconnect; if it fails it is thrown to the retry policy on the second

attempt.

Result:

no risk of a reopen() exception overriding the initial failure
the moved reopen() call has gone from being a special case only encountered after a double IOE to that on a single failure, so gets better coverage.
initial read failure will encounter the initial brief retry delay before trying to reconnect. I don't see that slowing down the operation at all as it was going to happen anyway.

wbo4958 · 2021-07-27T23:20:30Z

@steveloughran That's really a smart suggestion. I've changed it accordingly. Thx for the review.

BTW, I will upload the integration test results in JIRA later.

Thx again. Have a good day!

steveloughran · 2021-07-28T19:45:21Z

bq. BTW, I will upload the integration test results in JIRA later.

don't need the full results, just add a comment here about where you tested (e.g. #3240 (comment) ) and #3240 (comment) )

Regarding those failures -your AWS credentials are for some private store aren't they? So all tests referencing landsat data are failing as you are not authed by AWS.

The testing,md doc shows how you can switch to a different large file in your own store, after which it turns off a set of tests which won't be valid.

steveloughran

patch LGTM. I suspect the original reconnect logic dates from before all the Invoker.retry() stuff went in, we never really looked at it in full detail to see whether it needed attention. You know how it is: if things seem to work, leave them alone...

add your test setup in a comment and I'll do the merge

wbo4958 · 2021-07-29T03:43:50Z

Thx @steveloughran, I re-tested the integration test in AWS. and the result can be found at https://issues.apache.org/jira/browse/HADOOP-17812?focusedCommentId=17389239&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17389239

Tested
s3: us-west-2

steveloughran · 2021-07-30T19:06:15Z

+1, merged to trunk. Thank you for this

Bobby -can you cherrypick the test from trunk, apply to branch-3.3 and retest? I just need to know that it works. There'll be no extra review unless there are merge problems.

There would be major merge pain going back to 3.2; I'd like to avoid that. If you do want to go that way, it would be a separate patch, and we'd skip the test changes because that is so radically different

… to store (apache#3222) This improves error handling after multiple failures reading data -when the read fails and attempts to reconnect() also fail. Contributed by Bobby Wang.

… to store (#3222) This improves error handling after multiple failures reading data -when the read fails and attempts to reconnect() also fail. Contributed by Bobby Wang. Change-Id: If17dee395ad6b9b7c738021bad20d0a13eb4011e

… to store (apache#3222) This improves error handling after multiple failures reading data -when the read fails and attempts to reconnect() also fail. Contributed by Bobby Wang.

…to reconnect to store (apache#3222) This improves error handling after multiple failures reading data -when the read fails and attempts to reconnect() also fail. Contributed by Bobby Wang. Change-Id: If17dee395ad6b9b7c738021bad20d0a13eb4011e

wbo4958 mentioned this pull request Jul 22, 2021

[BUG] MultiFileCloudParquetPartitionReader can trigger NPE NVIDIA/spark-rapids#2915

Closed

wbo4958 force-pushed the s3a-npe branch from 6f70bba to c9f4bde Compare July 22, 2021 01:33

steveloughran requested changes Jul 22, 2021

View reviewed changes

steveloughran changed the title ~~HADOOP-17812. Fix s3a NPE when reading~~ HADOOP-17812. NPE in S3AInputStream read() after failure to reconnect to store Jul 22, 2021

add test

68755aa

apache deleted a comment from hadoop-yetus Jul 27, 2021

resolve comments

d7659fb

steveloughran approved these changes Jul 28, 2021

View reviewed changes

apache deleted a comment from hadoop-yetus Jul 30, 2021

steveloughran merged commit 266b1bd into apache:trunk Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HADOOP-17812. NPE in S3AInputStream read() after failure to reconnect to store #3222

HADOOP-17812. NPE in S3AInputStream read() after failure to reconnect to store #3222

Uh oh!

wbo4958 commented Jul 22, 2021

Uh oh!

wbo4958 commented Jul 22, 2021 •

edited

Loading

Uh oh!

steveloughran left a comment

Uh oh!

wbo4958 commented Jul 23, 2021

Uh oh!

steveloughran commented Jul 23, 2021 •

edited

Loading

Uh oh!

wbo4958 commented Jul 26, 2021

Uh oh!

steveloughran commented Jul 27, 2021

Uh oh!

wbo4958 commented Jul 27, 2021

Uh oh!

steveloughran commented Jul 28, 2021

Uh oh!

steveloughran left a comment

Uh oh!

wbo4958 commented Jul 29, 2021

Uh oh!

steveloughran commented Jul 30, 2021

Uh oh!

Uh oh!

HADOOP-17812. NPE in S3AInputStream read() after failure to reconnect to store #3222

HADOOP-17812. NPE in S3AInputStream read() after failure to reconnect to store #3222

Uh oh!

Conversation

wbo4958 commented Jul 22, 2021

Uh oh!

wbo4958 commented Jul 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveloughran left a comment

Choose a reason for hiding this comment

Testing

MultiFileCloudParquetPartitionReader

Uh oh!

wbo4958 commented Jul 23, 2021

Uh oh!

steveloughran commented Jul 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbo4958 commented Jul 26, 2021

Uh oh!

steveloughran commented Jul 27, 2021

Uh oh!

wbo4958 commented Jul 27, 2021

Uh oh!

steveloughran commented Jul 28, 2021

Uh oh!

steveloughran left a comment

Choose a reason for hiding this comment

Uh oh!

wbo4958 commented Jul 29, 2021

Uh oh!

steveloughran commented Jul 30, 2021

Uh oh!

Uh oh!

wbo4958 commented Jul 22, 2021 •

edited

Loading

`MultiFileCloudParquetPartitionReader`

steveloughran commented Jul 23, 2021 •

edited

Loading