[TRACKING ISSUE] Fix ClientInvocation Retry

The tracking issue for the Java side PR.

See https://github.com/hazelcast/hazelcast/pull/18309 for details.

---

Bug Description:
There seems to be multiple problems which is caused by single problem
we allow multiple threads to retry/change state of single invocation.
Example:
A connection is closed. CleanResourcesTask checks periodically and
handles invocation with connection closed.
   https://github.com/hazelcast/hazelcast/blob/04e9df4c2f406238267a8d45fbc9ae476b40c7d5/hazelcast/src/main/java/com/hazelcast/client/impl/spi/impl/ClientInvocationServiceImpl.java#L290-L301

This continues and schedules a retry here
    https://github.com/hazelcast/hazelcast/blob/04e9df4c2f406238267a8d45fbc9ae476b40c7d5/hazelcast/src/main/java/com/hazelcast/client/impl/spi/impl/ClientInvocation.java#L305-L315

Before this task gets a chance to run, CleanResoursTask can wakeup
check the same connection and schedule a new retry.

As a solution I first, try to sync them on `deRegisterInvocation(callid)`
,but this does not work. The idea was to only one that can
deRegisterInvocation will retry. Also a response/exception from
remote will notify if they can deregister the invocation.
One problem is correlation id is changed by the retried task, and
other threads are trying to deregister by reading that correlationId.
In response path, we have correlation id coming from remote but in
connection closed case, we don't have one to use(instead of reading
from invocation itself)

Another problem is how we handle connection close. Lets say a member
is about the close and we send an invocation to it. It can send,
ClusterPassive, InstanceNotActive like exceptions. And it closes the
connection after that. We have two cases to handle on the client side.
Lets say we have detected that connection is closed again in
CleanResoursesTask, and we are about to notify the invocation.
In the mean time we got InstanceNotActiveException as well, and we
retried it on another connection/with new call id.
Notify for connection close deregisters the new retry which should
not happen.

I have noticed we had similar problems on the core side.
#7270

I have investigate the solutions and core codebase
#9296
#9303

They are not suitable for the client. Especially connection close
case is different. On the member, we have MemberLeftException
which is guarded by `MemberListVersion` which are not applicable
to the client. So for the client, solution grew around `connection`
naturally.

Solution:
We coordinate all threads that wants to notify the invocation.
We achieve synchronization of different threads via `sentConnection`
field sentConnection starts as null.
It is set to a non-null when `connection` to be sent is determined.
It is set to null when a response/exception is going to be notified.
It is trying to be compared and set to null on connection close case
with dead connection, to prevent invocation to be notified which is
already retried on another connection.
Only one thread that can set this to null can be actively notify
a response/exception.

This way we make sure that only one thread changes the states
(changing correlation id/deregistration of invication).

fixes #18062


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TRACKING ISSUE] Fix ClientInvocation Retry #375

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[TRACKING ISSUE] Fix ClientInvocation Retry #375

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions