Understanding clusterRetryStrategy (after Failed to refresh slots cache)

ioredis version: **4.15.1** 
Running on elasticache cluster. Code via lambda.

Everything works flawlessly most of the time. However as our project is scaling up, very occasionally we have started seeing `Failed to refresh slots cache` errors, especially in our longer running scripts.

It's my understanding that `clusterRetryStrategy` should be called **before** ioredis throws any errors. From ioredis readme: 

> When a number is returned, ioredis will try to reconnect to the startup nodes from scratch after the specified delay (in ms). **Otherwise**, an error of "None of startup nodes is available" will be returned.

However, our logs are showing an error before `clusterRetryStrategy` is called (we are logging from the retry function). 

![image](https://user-images.githubusercontent.com/6539834/74698167-ce059880-5261-11ea-86f4-c1d6db76d6c3.png)

In addition, we are returning a number from `clusterRetryStrategy`, but it doesn't seem to have any effect. `clusterRetryStrategy` is only called once with `1` as the argument, and then the error flow begins and our code fails.

In Summary:
1. Should we be seeing any errors such as "None of startup nodes is available" before `clusterRetryStrategy` is ever called? (If not I think there's a bug).
2. How do we get `clusterRetryStrategy` to actually cause a reconnection? Are we supposed to be catching these errors somewhere so that ioredis actually has time to retry?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding clusterRetryStrategy (after Failed to refresh slots cache) #1062

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development