Provider reconnection topic

Currently, all our providers implement their back-off for the stream connection. But in a buggy way, it only recreates the stream; GRPC handles the reconnection independently. see https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md

This might lead to false assumptions about our configuration and paint the wrong picture regarding what people can configure and what to expect.

## languages

GRPC usually already offers a battle-tested reconnection logic. However, the configuration option is not the same for everyone.

For example, sometimes, we can set 3 out of the five defined options, but multiplier and jitter seem to be something we can't correctly set in all of them.

### Java

Java has an exponential backoff, which is not exposed for modification like in other languages see https://github.com/grpc/grpc-java/issues/9353

```
private long initialBackoffNanos = TimeUnit.SECONDS.toNanos(1);
  private long maxBackoffNanos = TimeUnit.MINUTES.toNanos(2);
  private double multiplier = 1.6;
  private double jitter = .2;
```

https://github.com/grpc/grpc-java/blob/3b39a83621626c844b16e64ec6389511903ee075/core/src/main/java/io/grpc/internal/ExponentialBackoffPolicy.java#L40-L43

#### findings

-  notifyWhenStateChanged(...)  registers a one-off callback and you need to re-register again

### C++, Python, Ruby, Objective-C, PHP, C#

Python uses the C++ library in the background and, therefore, offers a min and max backoff, ~~and it will be a random value in that range.~~ Values are mapped; see https://github.com/grpc/grpc/issues/25540#issuecomment-793441156. Seems like every language is in sync, but the docs would not suggest this behavior. I found the comment, but I am still looking to find the proper documentation.

> GRPC_ARG_INITIAL_RECONNECT_BACKOFF_MS i.e. "grpc.initial_reconnect_backoff_ms" corresponds to INITIAL_BACKOFF from the algorithm.
> The other configurable args are -
> GRPC_ARG_MIN_RECONNECT_BACKOFF_MS "grpc.min_reconnect_backoff_ms" which corresponds to MIN_CONNECT_TIMEOUT
> GRPC_ARG_MAX_RECONNECT_BACKOFF_MS "grpc.max_reconnect_backoff_ms" which corresponds to MAX_BACKOFF

from the [grpc c++ docs](https://grpc.github.io/grpc/core/group__grpc__arg__keys.html#) (maybe i am misintrepreting this docs):
```
#define GRPC_ARG_MAX_RECONNECT_BACKOFF_MS   "grpc.max_reconnect_backoff_ms"
The maximum time between subsequent connection attempts, in ms. 

#define GRPC_ARG_MIN_RECONNECT_BACKOFF_MS   "grpc.min_reconnect_backoff_ms"
The minimum time between subsequent connection attempts, in ms.

#define GRPC_ARG_INITIAL_RECONNECT_BACKOFF_MS   "grpc.initial_reconnect_backoff_ms"
The time between the first and second connection attempts, in ms. 
```

subchannel defaults
https://github.com/grpc/grpc/blob/782814e28006f132880c0aebbc3d0833d515fa61/src/core/client_channel/subchannel.cc#L75-L79

### javaScript

#### deprecated  implementation wrapping C++ client

utilizes C++ in the background, the same as python

#### grpc-js

Js implementation, here is a list of config properties: https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/README.md#supported-channel-options. It only supports two out of the three in Python used, based on docs

> grpc.initial_reconnect_backoff_ms
> grpc.max_reconnect_backoff_ms

```
const INITIAL_BACKOFF_MS = 1000;
const BACKOFF_MULTIPLIER = 1.6;
const MAX_BACKOFF_MS = 120000;
const BACKOFF_JITTER = 0.2;
```
https://github.com/grpc/grpc-node/blob/263c478c9a0216e1c864850248ce8efffd7c2da5/packages/grpc-js/src/backoff-timeout.ts#L18C1-L21C28

## Summary

Ultimately, all of them offer a reliable backoff mechanism ~~but not a unified one~~. They're somewhat unified but different to configure, if even possible. Hence, whether we want to offer a custom backoff is questionable, especially as it is currently not behaving as it should.

| language | initial (grpc.initial_reconnect_backoff_ms) | max (grpc.max_reconnect_backoff_ms)  | timeout  (grpc.min_reconnect_backoff_ms)  | jitter  | multiplier 
| ------------- | ------------- |------------- |------------- |------------ |------------ |
| C++, Python, Ruby, Objective-C, PHP, C#, js(deprecated)  | ✅    |✅  |✅  | 0.2 | 1.6 |
| js  | ✅  |✅  | ❌|  0.2 | 1.6 |
| java  |    ❌ | ❌ | ❌   | 0.2 | 1.6 |

Implementation-wise, it would be good to delegate reconnection to GRPC as it reduces our complexity and code to maintain. On the other hand, people might rely on it (this ticket is a bug, and we need to fix it).

If we delegate reconnection to GRPC, we should try to configure it, if possible (see table). Furthermore, we should utilize connection listeners (java: `notifyWhenStateChanged`, python: `subscribe` ...) to handle our connection state and [Wait-For-Ready](https://grpc.io/docs/guides/wait-for-ready/) for our streams to wait for a connection.

Generally, we should define a general approach to handling the connection topic, e.g., reconnect, error state, and best practice, so we sync all our implementations where possible and especially highlight provider implementations that differ from the recommendation.


### sidenote

I created this ticket because there have already been many discussions regarding this. Documenting this is quite essential; otherwise, we will talk about this again in a year and start from scratch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provider reconnection topic #1472

languages

Java

findings

C++, Python, Ruby, Objective-C, PHP, C#

javaScript

deprecated implementation wrapping C++ client

grpc-js

Summary

sidenote

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

language	initial (grpc.initial_reconnect_backoff_ms)	max (grpc.max_reconnect_backoff_ms)	timeout (grpc.min_reconnect_backoff_ms)	jitter	multiplier
C++, Python, Ruby, Objective-C, PHP, C#, js(deprecated)	✅	✅	✅	0.2	1.6
js	✅	✅	❌	0.2	1.6
java	❌	❌	❌	0.2	1.6

Provider reconnection topic #1472

Description

languages

Java

findings

C++, Python, Ruby, Objective-C, PHP, C#

javaScript

deprecated implementation wrapping C++ client

grpc-js

Summary

sidenote

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions