Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: delay retriable stream master listener close until all sub streams are closed #9754

Merged
merged 3 commits into from
Jan 12, 2023

Conversation

YifeiZhuang
Copy link
Contributor

@YifeiZhuang YifeiZhuang commented Dec 13, 2022

This is follow up fix of #9626

This helps to prevent retryable stream from using calloptions.executor when it shouldn't. It is done by delaying notifying upper layer (through masterListener).

@YifeiZhuang YifeiZhuang requested a review from ejona86 December 13, 2022 23:48
@YifeiZhuang
Copy link
Contributor Author

A very good test case (w/ debug message):

    Dec 15, 2022 5:58:51 PM io.grpc.internal.RetriableStream createSubstream
    INFO: created sub stream, inflight count  1
    Dec 15, 2022 5:58:52 PM io.grpc.internal.RetriableStream$1CommitTask run
    INFO: cancelling drained : {delegate=io.grpc.netty.NettyClientStream@414be180}
    Dec 15, 2022 5:58:52 PM io.grpc.internal.RetriableStream safeCloseMasterListener
    INFO: save close reason status:Status{code=DEADLINE_EXCEEDED, description=deadline exceeded after 0.959140709s. [closed=[], open=[[remote_addr=/0:0:0:0:0:0:0:0:49769]]], cause=null}, progress PROCESSED
io.grpc.testing.integration.AutoWindowSizingOnTest > deadlineExceeded FAILED
    org.junit.runners.model.TestTimedOutException: test timed out after 15 seconds
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at io.grpc.stub.ClientCalls$ThreadlessExecutor.waitAndDrain(ClientCalls.java:748)
        at io.grpc.stub.ClientCalls$BlockingResponseStream.waitForNext(ClientCalls.java:633)
        at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:657)
        at io.grpc.stub.ClientCalls$BlockingResponseStream.next(ClientCalls.java:676)
        at io.grpc.testing.integration.AbstractInteropTest.deadlineExceeded(AbstractInteropTest.java:1167)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.lang.Thread.run(Thread.java:748)

timeline:
Original stream is created and drained. inFlightStreamCount = 1
RetriableStream is cancelled and noop stream is committed. The commit cancel the original stream. (==> suppose that netty stream will call back on sub stream listener onClose but not happen.)
safeCloseMasterListener is called and listener not closed because inFlightStreamCount is not Integer.MIN_VALUE.

@YifeiZhuang YifeiZhuang marked this pull request as draft December 19, 2022 22:17
@YifeiZhuang
Copy link
Contributor Author

YifeiZhuang commented Jan 12, 2023

RetriableStream is cancelled and noop stream is committed. The commit cancel the original stream. (==> suppose that netty stream will call back on sub stream listener onClose but not happen.)

It turns out the sub listener closed() was called. It looks the race condition that although safeCloseMasterListener happens earlier then subListener.close(), the status not yet saved. So master listener can not be closed, (assert null failed prevents master listener to be called).

@YifeiZhuang YifeiZhuang marked this pull request as ready for review January 12, 2023 18:50
@ejona86
Copy link
Member

ejona86 commented Jan 12, 2023

That race fix looks right to me.

@YifeiZhuang YifeiZhuang merged commit 2b9bd6c into grpc:master Jan 12, 2023
@YifeiZhuang YifeiZhuang deleted the retry-stream branch January 12, 2023 22:51
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants