-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Fix: Prevent routing reads to Redis slave nodes in loading state #3370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ofekshenawa I am not sure (was not able to find information) if |
Update: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will work, but we should find a good way to test it. Let's add a note about it.
Won't this perform a PING in the critical path of execution of any command against a slave node? The LOADING error status should be caught lazily and cached in-memory, bounded by a penalty timeout, mirroring the implementation of We are observing more than a 2x latency regression on v9.9.0, which includes this commit, when slave routing is enabled. @ndyakov I think it would be a good idea to issue a patch release either with this commit reverted, or the behavior fixed forward as suggested above. |
@ndyakov Just like the comment above, in the project I'm working on we're pushing thousands of commands through a pipeline, and we've noticed that latency has more than doubled. This change really needs to be rolled back and handled differently, as sending a |
@LINKIWI , @ste93cry thank you for reporting this. The loaded state should / can be cached, you are both correct. @ste93cry for the time being, feel free to use a release before this patch, I will work on adding a cache and will release a new patch soon. Let's assume we add a |
Thanks for following up. Regarding when the cached loaded state should return to 0: In most cases/for most users, I suspect slaves should only be in the A separate follow up here could be to transparently fail over command execution to another available slave after observing |
@LINKIWI I think introducing a timeout will require a more testing to calculate a good default value of the timeout. Here is a naive approach without a timeout, let's continue the discussion there. I would prefer to have the naive approach merged and released and then, once we are the time we can find a better approach #3410 |
Problem
When a new slave node is added to a Redis cluster, it enters a loading state while it synchronizes data from its master. If the client has
ReadOnly=true
enabled, it may route read operations to this loading slave node, which can result in nil values or errors being returned to the application.closes #3222
Solution
This PR adds logic to detect when a slave node is in the loading state (by checking for LOADING errors) and temporarily excludes such nodes from the read distribution until they're ready to serve requests.