-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Attempt to close kudu scanner only one time #17462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to close kudu scanner only one time #17462
Conversation
c0dafac
to
0a65747
Compare
catch (KuduException e) { | ||
throw new RuntimeException(e); | ||
else { | ||
throw new TrinoException(GENERIC_INTERNAL_ERROR, "Cursor already closed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw vs log all previous close
calls (i.e. keep track of all callers and log them if close is called 2nd or more times)?
Asking because the throw stack trace won't tell us which "other" thread closed the cursor - it'll only tell us that we closed it twice and who closed it the 2nd time.
I'm assuming this is for a debug build we can use to find the actual reason why this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I am not sure, that such situation is possible. And if it's really the case - is it something exceptional or not.
I am even thinking about maybe we should not throw exception and fail the query, but just log it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think just logging it (and the previous stack-trace of who called) should be sufficient to find why it's happening.
So maybe we can log + not fail the query until we find the cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But from Java docs for KuduScanner#close it says
* Closes this scanner (don't forget to call this when you're done with it!).
* <p>
* Closing a scanner already closed has no effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I see their close method looks like this :
public Deferred<RowResultIterator> close() {
if (closed) {
return Deferred.fromResult(null);
}
return client.closeScanner(this).addCallback(closedCallback()); // TODO errBack ?
}
So second close is guarded by variable closed.
But looks like this variable is not thread safe:
private boolean closed = false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 1.15.0 and it contains the same code that you pointed, thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we don't use that AysncKuduScanner
directly rather we use KuduScanner
which claims it provides a Synchronous version of {@link AsyncKuduScanner}. Offers the same API but with blocking methods.
- but I'm not sure if it provides a synchronous version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It provides synchronous version, but it doesn't mention it thread safe.
Need to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I see AsyncKuduScanner (which is not thread safe by documentation) returns all operation as Deferred, as far as I see it's kind of analogous of Future (https://github.com/OpenTSDB/async/tree/master).
KuduScanner scanner wraps almost all methods of AsyncKuduScanner and do join on Deferred's from AsyncKuduScanner. (looks like it's analogous of Future.get())
However I am not 100% sure, that synchronisation guarantees are as strong as Future.get.
First of all, it has some strange description:
Contract and Invariants
- Only one thread at a time is going to execute the callback chain.
- Each action taken by a callback happens-before the next callback is invoked. In other words, if a callback chain manipulates a variable (and no one else manipulates it), no synchronization is required
I dived a little bit in the code but cannot guarantee that I understood it fully, but noticed some strange things:
- like executing method itself is under synchronised block and adding callbacks to callback list also, but executing callback itself is not (but for example we are setting closed flag inside callback not inside method itself)
So as bottom line I am not sure that closed variable in AsyncKuduScanner via KuduScanner is guarded properly.
And I think this PR is worth to try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KuduScanner
public RowResultIterator close() throws KuduException {
return KuduClient.joinAndHandleException(asyncScanner.close());
}
and this code call next one:
AsyncKuduScanner
public Deferred<RowResultIterator> close() {
if (closed) {
return Deferred.fromResult(null);
}
return client.closeScanner(this).addCallback(closedCallback()); // TODO errBack ?
}
So as we see here - access to variable closed is not guarded in anyway, so we can start execute incorrect brunch no matter what is inside - joinAndHandleException
0a65747
to
46cf077
Compare
KuduScanner#close calls AsyncKuduScanner#close, and AsyncKuduScanner uses a boolean field to track whether the scanner was closed or not but the field is neither synchronized on nor volatile leading to possible race-conditions. To mitigate that we guard the close here with a volatile boolean. See https://github.com/apache/kudu/blob/64619147e7932518bb88b3f3f719d9c27cfdae68/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java#L744
46cf077
to
2515b3b
Compare
Added a code comment + some details in commit message. |
Description
Sometimes happens that we got
During scanner close,
as we see scanner itself not thread safe
My assumption is, that if we close RecordPageSource-> KuduRecordSet more then one time from different threads, we can get this issue.However I am not sure is it possible from trino side.So this PR is more to raise discussion can we hit such situation?
Ok so this looks like valid explanation.
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: