Skip to content

Understanding clusterRetryStrategy (after Failed to refresh slots cache) #1062

Open
@jeremytm

Description

ioredis version: 4.15.1
Running on elasticache cluster. Code via lambda.

Everything works flawlessly most of the time. However as our project is scaling up, very occasionally we have started seeing Failed to refresh slots cache errors, especially in our longer running scripts.

It's my understanding that clusterRetryStrategy should be called before ioredis throws any errors. From ioredis readme:

When a number is returned, ioredis will try to reconnect to the startup nodes from scratch after the specified delay (in ms). Otherwise, an error of "None of startup nodes is available" will be returned.

However, our logs are showing an error before clusterRetryStrategy is called (we are logging from the retry function).

image

In addition, we are returning a number from clusterRetryStrategy, but it doesn't seem to have any effect. clusterRetryStrategy is only called once with 1 as the argument, and then the error flow begins and our code fails.

In Summary:

  1. Should we be seeing any errors such as "None of startup nodes is available" before clusterRetryStrategy is ever called? (If not I think there's a bug).
  2. How do we get clusterRetryStrategy to actually cause a reconnection? Are we supposed to be catching these errors somewhere so that ioredis actually has time to retry?

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions