Skip to content

Comments

Sentinel: fix regression requiring "+failover" ACL in failover path#2780

Merged
hwware merged 1 commit intovalkey-io:unstablefrom
gmbnomis:fix_2779
Oct 31, 2025
Merged

Sentinel: fix regression requiring "+failover" ACL in failover path#2780
hwware merged 1 commit intovalkey-io:unstablefrom
gmbnomis:fix_2779

Conversation

@gmbnomis
Copy link
Contributor

@gmbnomis gmbnomis commented Oct 28, 2025

Since Valkey Sentinel 9.0, sentinel tries to abort an ongoing failover when changing the role of a monitored instance. Since the result of the command is ignored, the "FAILOVER ABORT" command is sent irrespective of the actual failover status.

However, when using the documented pre 9.0 ACLs for a dedicated sentinel user, the FAILOVER command is not allowed and all failover cases fail. (Additionally, the necessary ACL adaptation was not communicated well.)

Address this by:

  • Updating the documentation in "sentinel.conf" to reflect the need for an adapted ACL

  • Only abort a failover when sentinel detected an ongoing (probably stuck) failover. This means that standard failover and manual failover continue to work with unchanged pre 9.0 ACLs. Only the new "SENTINEL FAILOVER COORDINATED" requires to adapt the ACL on all Valkey nodes.

  • Actually use a dedicated sentinel user and ACLs when testing standard failover, manual failover, and manual coordinated failover.

Fixes #2779

@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 72.46%. Comparing base (909d082) to head (2c836eb).
⚠️ Report is 10 commits behind head on unstable.

Files with missing lines Patch % Lines
src/sentinel.c 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2780      +/-   ##
============================================
+ Coverage     72.44%   72.46%   +0.01%     
============================================
  Files           128      128              
  Lines         70145    70145              
============================================
+ Hits          50819    50833      +14     
+ Misses        19326    19312      -14     
Files with missing lines Coverage Δ
src/sentinel.c 0.00% <0.00%> (ø)

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zuiderkwast zuiderkwast requested a review from hwware October 28, 2025 18:27
@hwware
Copy link
Contributor

hwware commented Oct 28, 2025

Do we need to backport this PR to Valkey 9.0?

@hwware hwware added the bug Something isn't working label Oct 28, 2025
@gmbnomis
Copy link
Contributor Author

Do we need to backport this PR to Valkey 9.0?

@hwware Yes, we should. Please let me know if I can help here (PR into the 9.0 branch?).

gmbnomis added a commit to gmbnomis/valkey-doc that referenced this pull request Oct 28, 2025
* Add documentation for the new coordinated failover command
* Update ACL documentation to reflect version 9.0 requirements. Use
  the ACL from the `sentinel.config` as a basis for consistency
  (see also valkey-io/valkey#2780)
* Point out that ACL permissions need to be updated for Redis 9.0+

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
gmbnomis added a commit to gmbnomis/valkey-doc that referenced this pull request Oct 28, 2025
* Add documentation for the new coordinated failover command
* Update ACL documentation to reflect version 9.0 requirements. Use
  the ACL from the `sentinel.config` as a basis for consistency
  (see also valkey-io/valkey#2780)
* Point out that ACL permissions need to be updated for Redis 9.0+

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
gmbnomis added a commit to gmbnomis/valkey-doc that referenced this pull request Oct 28, 2025
* Add documentation for the new coordinated failover command
* Update ACL documentation to reflect version 9.0 requirements. Use
  the ACL from the `sentinel.config` as a basis for consistency
  (see also valkey-io/valkey#2780)
* Point out that ACL permissions need to be updated for Redis 9.0+

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
gmbnomis added a commit to gmbnomis/valkey-doc that referenced this pull request Oct 29, 2025
* Add documentation for the new coordinated failover command
* Update ACL documentation to reflect version 9.0 requirements. Use
  the ACL from the `sentinel.config` as a basis for consistency
  (see also valkey-io/valkey#2780)
* Point out that ACL permissions need to be updated for Redis 9.0+

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Since Valkey Sentinel 9.0, sentinel tries to abort an ongoing failover
when changing the role of a monitored instance. Since the result of the
command is ignored, the "FAILOVER ABORT" command is send irrespective of
the actual failover status.

However, when using the documented pre 9.0 ACLs for a dedicated sentinel
user, the FAILOVER command is not allowed and _all_ failover cases fail.
(Additionally, the necessary ACL adaptation was not communicated well.)

Address this by:

- Updating the documentation in "sentinel.conf" to reflect the need for
  an adapted ACL

- Only abort a failover when sentinel detected an ongoing (probably
  stuck) failover. This means that standard failover and manual failover
  continue to work with unchanged pre 9.0 ACLs. Only the new "SENTINEL
  FAILOVER COORDINATED" requires to adapt the ACL on all Valkey nodes.

- Actually use a dedicated sentinel user and ACLs when testing standard
  failover, manual failover, and manual coordinated failover.

Fixes valkey-io#2779

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
@hwware hwware moved this to To be backported in Valkey 9.0 Oct 31, 2025
@hwware hwware merged commit 6cbc1a3 into valkey-io:unstable Oct 31, 2025
55 of 56 checks passed
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 15, 2025
…alkey-io#2780)

Since Valkey Sentinel 9.0, sentinel tries to abort an ongoing failover
when changing the role of a monitored instance. Since the result of the
command is ignored, the "FAILOVER ABORT" command is sent irrespective of
the actual failover status.

However, when using the documented pre 9.0 ACLs for a dedicated sentinel
user, the FAILOVER command is not allowed and _all_ failover cases fail.
(Additionally, the necessary ACL adaptation was not communicated well.)

Address this by:

- Updating the documentation in "sentinel.conf" to reflect the need for
an adapted ACL

- Only abort a failover when sentinel detected an ongoing (probably
stuck) failover. This means that standard failover and manual failover
continue to work with unchanged pre 9.0 ACLs. Only the new "SENTINEL
FAILOVER COORDINATED" requires to adapt the ACL on all Valkey nodes.

- Actually use a dedicated sentinel user and ACLs when testing standard
failover, manual failover, and manual coordinated failover.

Fixes valkey-io#2779

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 15, 2025
…alkey-io#2780)

Since Valkey Sentinel 9.0, sentinel tries to abort an ongoing failover
when changing the role of a monitored instance. Since the result of the
command is ignored, the "FAILOVER ABORT" command is sent irrespective of
the actual failover status.

However, when using the documented pre 9.0 ACLs for a dedicated sentinel
user, the FAILOVER command is not allowed and _all_ failover cases fail.
(Additionally, the necessary ACL adaptation was not communicated well.)

Address this by:

- Updating the documentation in "sentinel.conf" to reflect the need for
an adapted ACL

- Only abort a failover when sentinel detected an ongoing (probably
stuck) failover. This means that standard failover and manual failover
continue to work with unchanged pre 9.0 ACLs. Only the new "SENTINEL
FAILOVER COORDINATED" requires to adapt the ACL on all Valkey nodes.

- Actually use a dedicated sentinel user and ACLs when testing standard
failover, manual failover, and manual coordinated failover.

Fixes valkey-io#2779

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
enjoy-binbin pushed a commit to valkey-io/valkey-doc that referenced this pull request Nov 26, 2025
* Add documentation for the new coordinated failover command
* Update ACL documentation to reflect version 9.0 requirements. Use
  the ACL from the `sentinel.config` as a basis for consistency
  (see also valkey-io/valkey#2780)
* Point out that ACL permissions need to be updated for Redis 9.0+

This option was introduced by
valkey-io/valkey#1292 and is part of Valkey 9.0

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 28, 2025
…alkey-io#2780)

Since Valkey Sentinel 9.0, sentinel tries to abort an ongoing failover
when changing the role of a monitored instance. Since the result of the
command is ignored, the "FAILOVER ABORT" command is sent irrespective of
the actual failover status.

However, when using the documented pre 9.0 ACLs for a dedicated sentinel
user, the FAILOVER command is not allowed and _all_ failover cases fail.
(Additionally, the necessary ACL adaptation was not communicated well.)

Address this by:

- Updating the documentation in "sentinel.conf" to reflect the need for
an adapted ACL

- Only abort a failover when sentinel detected an ongoing (probably
stuck) failover. This means that standard failover and manual failover
continue to work with unchanged pre 9.0 ACLs. Only the new "SENTINEL
FAILOVER COORDINATED" requires to adapt the ACL on all Valkey nodes.

- Actually use a dedicated sentinel user and ACLs when testing standard
failover, manual failover, and manual coordinated failover.

Fixes valkey-io#2779

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 28, 2025
…alkey-io#2780)

Since Valkey Sentinel 9.0, sentinel tries to abort an ongoing failover
when changing the role of a monitored instance. Since the result of the
command is ignored, the "FAILOVER ABORT" command is sent irrespective of
the actual failover status.

However, when using the documented pre 9.0 ACLs for a dedicated sentinel
user, the FAILOVER command is not allowed and _all_ failover cases fail.
(Additionally, the necessary ACL adaptation was not communicated well.)

Address this by:

- Updating the documentation in "sentinel.conf" to reflect the need for
an adapted ACL

- Only abort a failover when sentinel detected an ongoing (probably
stuck) failover. This means that standard failover and manual failover
continue to work with unchanged pre 9.0 ACLs. Only the new "SENTINEL
FAILOVER COORDINATED" requires to adapt the ACL on all Valkey nodes.

- Actually use a dedicated sentinel user and ACLs when testing standard
failover, manual failover, and manual coordinated failover.

Fixes valkey-io#2779

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 28, 2025
…alkey-io#2780)

Since Valkey Sentinel 9.0, sentinel tries to abort an ongoing failover
when changing the role of a monitored instance. Since the result of the
command is ignored, the "FAILOVER ABORT" command is sent irrespective of
the actual failover status.

However, when using the documented pre 9.0 ACLs for a dedicated sentinel
user, the FAILOVER command is not allowed and _all_ failover cases fail.
(Additionally, the necessary ACL adaptation was not communicated well.)

Address this by:

- Updating the documentation in "sentinel.conf" to reflect the need for
an adapted ACL

- Only abort a failover when sentinel detected an ongoing (probably
stuck) failover. This means that standard failover and manual failover
continue to work with unchanged pre 9.0 ACLs. Only the new "SENTINEL
FAILOVER COORDINATED" requires to adapt the ACL on all Valkey nodes.

- Actually use a dedicated sentinel user and ACLs when testing standard
failover, manual failover, and manual coordinated failover.

Fixes valkey-io#2779

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
@zuiderkwast zuiderkwast moved this from To be backported to 9.0.1 (WIP) in Valkey 9.0 Dec 4, 2025
zuiderkwast pushed a commit to zuiderkwast/placeholderkv that referenced this pull request Dec 4, 2025
…alkey-io#2780)

Since Valkey Sentinel 9.0, sentinel tries to abort an ongoing failover
when changing the role of a monitored instance. Since the result of the
command is ignored, the "FAILOVER ABORT" command is sent irrespective of
the actual failover status.

However, when using the documented pre 9.0 ACLs for a dedicated sentinel
user, the FAILOVER command is not allowed and _all_ failover cases fail.
(Additionally, the necessary ACL adaptation was not communicated well.)

Address this by:

- Updating the documentation in "sentinel.conf" to reflect the need for
an adapted ACL

- Only abort a failover when sentinel detected an ongoing (probably
stuck) failover. This means that standard failover and manual failover
continue to work with unchanged pre 9.0 ACLs. Only the new "SENTINEL
FAILOVER COORDINATED" requires to adapt the ACL on all Valkey nodes.

- Actually use a dedicated sentinel user and ACLs when testing standard
failover, manual failover, and manual coordinated failover.

Fixes valkey-io#2779

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
zuiderkwast pushed a commit that referenced this pull request Dec 9, 2025
…2780)

Since Valkey Sentinel 9.0, sentinel tries to abort an ongoing failover
when changing the role of a monitored instance. Since the result of the
command is ignored, the "FAILOVER ABORT" command is sent irrespective of
the actual failover status.

However, when using the documented pre 9.0 ACLs for a dedicated sentinel
user, the FAILOVER command is not allowed and _all_ failover cases fail.
(Additionally, the necessary ACL adaptation was not communicated well.)

Address this by:

- Updating the documentation in "sentinel.conf" to reflect the need for
an adapted ACL

- Only abort a failover when sentinel detected an ongoing (probably
stuck) failover. This means that standard failover and manual failover
continue to work with unchanged pre 9.0 ACLs. Only the new "SENTINEL
FAILOVER COORDINATED" requires to adapt the ACL on all Valkey nodes.

- Actually use a dedicated sentinel user and ACLs when testing standard
failover, manual failover, and manual coordinated failover.

Fixes #2779

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: 9.0.1

Development

Successfully merging this pull request may close these issues.

[BUG] SENTINEL FAILOVER breaks on Valkey 9 with previously working ACLs

2 participants