Understanding clusterRetryStrategy (after Failed to refresh slots cache) #1062
Description
ioredis version: 4.15.1
Running on elasticache cluster. Code via lambda.
Everything works flawlessly most of the time. However as our project is scaling up, very occasionally we have started seeing Failed to refresh slots cache
errors, especially in our longer running scripts.
It's my understanding that clusterRetryStrategy
should be called before ioredis throws any errors. From ioredis readme:
When a number is returned, ioredis will try to reconnect to the startup nodes from scratch after the specified delay (in ms). Otherwise, an error of "None of startup nodes is available" will be returned.
However, our logs are showing an error before clusterRetryStrategy
is called (we are logging from the retry function).
In addition, we are returning a number from clusterRetryStrategy
, but it doesn't seem to have any effect. clusterRetryStrategy
is only called once with 1
as the argument, and then the error flow begins and our code fails.
In Summary:
- Should we be seeing any errors such as "None of startup nodes is available" before
clusterRetryStrategy
is ever called? (If not I think there's a bug). - How do we get
clusterRetryStrategy
to actually cause a reconnection? Are we supposed to be catching these errors somewhere so that ioredis actually has time to retry?