Skip to content

Attempt to close kudu scanner only one time #17462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

vlad-lyutenko
Copy link
Contributor

@vlad-lyutenko vlad-lyutenko commented May 11, 2023

Description

Sometimes happens that we got

Caused by: java.lang.NullPointerException
at org.apache.kudu.shaded.com.google.protobuf.ByteString$LiteralByteString.<init>(ByteString.java:1315)
at org.apache.kudu.shaded.com.google.protobuf.ByteString.wrap(ByteString.java:391)
at org.apache.kudu.shaded.com.google.protobuf.UnsafeByteOperations.unsafeWrap(UnsafeByteOperations.java:75)
at org.apache.kudu.client.AsyncKuduScanner$ScanRequest.createRequestPB(AsyncKuduScanner.java:1137)
at org.apache.kudu.client.RpcProxy.rpcToMessage(RpcProxy.java:199)
at org.apache.kudu.client.RpcProxy.sendRpc(RpcProxy.java:153)
at org.apache.kudu.client.AsyncKuduClient.closeScanner(AsyncKuduClient.java:1252)
at org.apache.kudu.client.AsyncKuduScanner.close(AsyncKuduScanner.java:747)
... 17 more

During scanner close,
as we see scanner itself not thread safe

private boolean closed = false;

public Deferred<RowResultIterator> close() {
  if (closed) {
    return Deferred.fromResult(null);
  }
  return client.closeScanner(this).addCallback(closedCallback()); // TODO errBack ?
}

My assumption is, that if we close RecordPageSource-> KuduRecordSet more then one time from different threads, we can get this issue.

However I am not sure is it possible from trino side.
So this PR is more to raise discussion can we hit such situation?

Ok so this looks like valid explanation.

  • we read from one thread, reach end of keys scanner is closed automatically
  • then we do first time close of RecordPageSource, which close already closed scanner but from other thread, just because it's not guarded internally we fail

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label May 11, 2023
@hashhar hashhar requested review from Praveen2112 and hashhar May 15, 2023 09:48
@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/kudu-close-scanner branch from c0dafac to 0a65747 Compare May 15, 2023 13:38
catch (KuduException e) {
throw new RuntimeException(e);
else {
throw new TrinoException(GENERIC_INTERNAL_ERROR, "Cursor already closed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw vs log all previous close calls (i.e. keep track of all callers and log them if close is called 2nd or more times)?

Asking because the throw stack trace won't tell us which "other" thread closed the cursor - it'll only tell us that we closed it twice and who closed it the 2nd time.

I'm assuming this is for a debug build we can use to find the actual reason why this happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I am not sure, that such situation is possible. And if it's really the case - is it something exceptional or not.
I am even thinking about maybe we should not throw exception and fail the query, but just log it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think just logging it (and the previous stack-trace of who called) should be sufficient to find why it's happening.

So maybe we can log + not fail the query until we find the cause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But from Java docs for KuduScanner#close it says

   * Closes this scanner (don't forget to call this when you're done with it!).
   * <p>
   * Closing a scanner already closed has no effect.

Copy link
Contributor Author

@vlad-lyutenko vlad-lyutenko May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see their close method looks like this :

public Deferred<RowResultIterator> close() {
  if (closed) {
    return Deferred.fromResult(null);
  }
  return client.closeScanner(this).addCallback(closedCallback()); // TODO errBack ?
}

So second close is guarded by variable closed.

But looks like this variable is not thread safe:

private boolean closed = false;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 1.15.0 and it contains the same code that you pointed, thanks !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't use that AysncKuduScanner directly rather we use KuduScanner which claims it provides a Synchronous version of {@link AsyncKuduScanner}. Offers the same API but with blocking methods. - but I'm not sure if it provides a synchronous version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It provides synchronous version, but it doesn't mention it thread safe.
Need to check

Copy link
Contributor Author

@vlad-lyutenko vlad-lyutenko May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see AsyncKuduScanner (which is not thread safe by documentation) returns all operation as Deferred, as far as I see it's kind of analogous of Future (https://github.com/OpenTSDB/async/tree/master).

KuduScanner scanner wraps almost all methods of AsyncKuduScanner and do join on Deferred's from AsyncKuduScanner. (looks like it's analogous of Future.get())

However I am not 100% sure, that synchronisation guarantees are as strong as Future.get.

First of all, it has some strange description:

Contract and Invariants

- Only one thread at a time is going to execute the callback chain.
- Each action taken by a callback happens-before  the next callback is invoked. In other words, if a callback chain manipulates a variable (and no one else manipulates it), no synchronization is required  

I dived a little bit in the code but cannot guarantee that I understood it fully, but noticed some strange things:

  • like executing method itself is under synchronised block and adding callbacks to callback list also, but executing callback itself is not (but for example we are setting closed flag inside callback not inside method itself)

So as bottom line I am not sure that closed variable in AsyncKuduScanner via KuduScanner is guarded properly.
And I think this PR is worth to try.

Copy link
Contributor Author

@vlad-lyutenko vlad-lyutenko May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KuduScanner

public RowResultIterator close() throws KuduException {
    return KuduClient.joinAndHandleException(asyncScanner.close());
  }

and this code call next one:

AsyncKuduScanner

public Deferred<RowResultIterator> close() {
    if (closed) {
      return Deferred.fromResult(null);
    }
    return client.closeScanner(this).addCallback(closedCallback()); // TODO errBack ?
  }

So as we see here - access to variable closed is not guarded in anyway, so we can start execute incorrect brunch no matter what is inside - joinAndHandleException

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/kudu-close-scanner branch from 0a65747 to 46cf077 Compare May 16, 2023 12:51
KuduScanner#close calls AsyncKuduScanner#close, and AsyncKuduScanner
uses a boolean field to track whether the scanner was closed or not but
the field is neither synchronized on nor volatile leading to possible
race-conditions. To mitigate that we guard the close here with a
volatile boolean.

See https://github.com/apache/kudu/blob/64619147e7932518bb88b3f3f719d9c27cfdae68/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java#L744
@hashhar hashhar force-pushed the vlad-lyutenko/kudu-close-scanner branch from 46cf077 to 2515b3b Compare May 19, 2023 11:37
@hashhar
Copy link
Member

hashhar commented May 19, 2023

Added a code comment + some details in commit message.

@hashhar hashhar merged commit 551e710 into trinodb:master May 19, 2023
@github-actions github-actions bot added this to the 419 milestone May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants