Skip to content

cluster: allow to overcome significant downtimes #228

Open
@Totktonada

Description

@Totktonada

We changed retryCount configuration option meaning in the scope of #167 (it lands into 1.9.2 release). In 1.9.1 it means overall attempts amount. In 1.9.2 it means amount of attempts to connect to one instance.

Now the cluster client tries to connect to one instance retryCount times, then tries to connect to other instances. If it tries all instances and there was no luck, then the client dies (going into CLOSED state).

It is likely that a user will set considerably small connectionTimeout and retryCount to reconnect to another instance sooner if there is a problem with current one. However if there is a need to overcome significant downtimes / connectivity problems while save ability to fast change of an instance during its local problem, we need to change the algorithm somehow.

I have two possible variants:

  1. Add a configuration option that will allow to configure amount of cycles to connect to the whole cluster (now it is always 1).
  2. Change the order of connection attempts: try to connect a first instance one time, then the next one, etc in a loop until we'll try to connect each retryCount times.

Not sure it is good to change the order of attempts, because it is the user-visible behaviour, so I stick more with the first variant.


Several side notes.

This problem can be overcomed on a user side, however we have no ability to reconnect a died client (see #229), so a user will need to re-create a client. It would be good to handle this on our side to eliminate need of extra logic of a user side.

Re-creation of a client can lead to inability to connect if a user bootstraps the client from one instance and use cluster discovery to fetch others if the need to re-create the client occurs during a problem with the initial instance (or if the initial configuration was not updated at time). I think it worth to expose last cluster discovery result to give a user an ability to handle this problem if it want to re-create a cluster client anyway (issue TBD).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions