Description
What version of gRPC-Java are you using?
1.54.1
What is your environment?
JDK 1.8, Weblogic 12
What did you expect to see?
No deadlocked threads
What did you see instead?
One thread trying to cancel and another thread trying to create a new stream, both holding the opposing locks.
Also happens on 2 threads trying to cancel.
Steps to reproduce the bug
I'm working on an application with ~10m requests per day and unfortunately a not optimal network which causes quite frequent retries.
I can't reproduce the issue nor do I know what is causing this exactly, but the application was running with the non-hedging retry for a few months now without any deadlock. Recently we switched to hedging retry, to remedy the network delays. Since then after a few hours or days we see servers start having deadlocks (see stacktraces of both threads attached).
Not exactly sure if this is caused by hedging but one code path in
io.grpc.internal.RetriableStream$1CommitTask.run(RetriableStream.java:194)
mentions that this is only used for hedging.