Skip to content

Comments

Backport fixes and add release notes for 9.0.1#2906

Merged
zuiderkwast merged 18 commits intovalkey-io:9.0from
zuiderkwast:backport-for-9-0-1
Dec 9, 2025
Merged

Backport fixes and add release notes for 9.0.1#2906
zuiderkwast merged 18 commits intovalkey-io:9.0from
zuiderkwast:backport-for-9-0-1

Conversation

@zuiderkwast
Copy link
Contributor

@zuiderkwast zuiderkwast commented Dec 4, 2025

murphyjacob4 and others added 15 commits December 4, 2025 18:24
…alkey-io#2785)

Just setting the authenticated flag actually authenticates to the
default user in this case. The default user may be granted no permission
to use CLUSTER SYNCSLOTS.

Instaed, we now authenticate to the NULL/internal user, which grants
access to all commands. This is the same as what we do for replication:


https://github.com/valkey-io/valkey/blob/864de555ced5354976ae4f97f44977041556115f/src/replication.c#L4717

Add a test for this case as well.

Closes valkey-io#2783

Signed-off-by: Jacob Murphy <jkmurphy@google.com>
…ters (valkey-io#2786)

Fixes an assert crash in _writeToClient():

    serverAssert(c->io_last_written.data_len == 0 ||
                 c->io_last_written.buf == c->buf);

The issue occurs when clientsCronResizeOutputBuffer() grows or
reallocates c->buf while io_last_written still points to the old buffer
and data_len is non-zero. On the next write, both conditions in the
assertion become false.

Reset io_last_written when resizing the output buffer to prevent stale
pointers and keep state consistent.

fixes valkey-io#2769

Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>
…alkey-io#2780)

Since Valkey Sentinel 9.0, sentinel tries to abort an ongoing failover
when changing the role of a monitored instance. Since the result of the
command is ignored, the "FAILOVER ABORT" command is sent irrespective of
the actual failover status.

However, when using the documented pre 9.0 ACLs for a dedicated sentinel
user, the FAILOVER command is not allowed and _all_ failover cases fail.
(Additionally, the necessary ACL adaptation was not communicated well.)

Address this by:

- Updating the documentation in "sentinel.conf" to reflect the need for
an adapted ACL

- Only abort a failover when sentinel detected an ongoing (probably
stuck) failover. This means that standard failover and manual failover
continue to work with unchanged pre 9.0 ACLs. Only the new "SENTINEL
FAILOVER COORDINATED" requires to adapt the ACL on all Valkey nodes.

- Actually use a dedicated sentinel user and ACLs when testing standard
failover, manual failover, and manual coordinated failover.

Fixes valkey-io#2779

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
… bidirectional links (valkey-io#2817)

After network failure nodes that come back to cluster do not always send
and/or receive messages from other nodes in shard, this fix avoids usage
of light weight messages to nodes with not ready bidirectional links.
When a light message comes before any normal message, freeing of cluster
link is happening because on the just established connection link->node
is not assigned yet. It is assigned in getNodeFromLinkAndMsg right after
the condition if (is_light).
So on a cluster with heavy pubsub load a long loop of disconnects is
possible, and we got this.
1. node A establishes cluster link to node B
2. node A propagates PUBLISH to node B
3. node B frees cluster link because of link->node == null as it has not
received non-light messages yet
4. go to 1.
During this loop subscribers of node B does not receive any messages
published to node A.

So here we want to make sure that PING was sent (and link->node was
initialized) on this connection before using lightweight messages.

---------

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
…key-io#2840)

This prevents crashes on the older nodes in mixed clusters where some
nodes are running 8.0 or older. Mixed clusters often exist temporarily
during rolling upgrades.

Fixes: valkey-io#2341 

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
…oved (valkey-io#2787)

There’s an issue with the LTRIM command. When LTRIM does not actually
modify the key — for example, with `LTRIM key 0 -1` — the server.dirty
counter is not updated because both ltrim and rtrim values are 0. As a
result, the command is not propagated. However, `signalModifiedKey` is
still called regardless of whether server.dirty changes. This behavior
is unexpected and can cause a mismatch between the source and target
during propagation, since the LTRIM command is not sent.

Signed-off-by: Harry Lin <harrylhl@amazon.com>
Co-authored-by: Harry Lin <harrylhl@amazon.com>
Related test failures: 

    *** [err]: Replica importing key containment (slot 0 from node 0 to 2) - DBSIZE command excludes importing keys in tests/unit/cluster/cluster-migrateslots.tcl
    Expected '1' to match '0' (context: type eval line 2 cmd {assert_match "0" [R $node_idx DBSIZE]} proc ::test)

The reason is that we don't wait for the primary-replica synchronization
to complete before starting the next testcase.

---------

Signed-off-by: yzc-yzc <96833212+yzc-yzc@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
)

This PR fixes the freebsd daily job that has been failing consistently
for the last days with the error "pkg: No packages available to install
matching 'lang/tclx' have been found in the repositories".

The package name is corrected from `lang/tclx` to `lang/tclX`. The
lowercase version worked previously but appears to have stopped working
in an update of freebsd's pkg tool to 2.4.x.

Example of failed job:

https://github.com/valkey-io/valkey/actions/runs/19282092345/job/55135193499

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
…ey-io#2856)

Fix a little miss in "Hash field TTL and active expiry propagates
correctly through chain replication" test in `hashexpire.tcl`.
The test did not wait for the initial sync of the chained replica and thus  made the test flakey

Signed-off-by: Arad Zilberstein <aradz@amazon.com>
Only enable `HAVE_ARM_NEON` on AArch64 because it supports vaddvq and
all needed compiler intrinsics.

Fixes the following error when building for machine `qemuarm` using the
Yocto Project and OpenEmbedded:

```
| bitops.c: In function 'popcountNEON':
| bitops.c:219:23: error: implicit declaration of function 'vaddvq_u16'; did you mean 'vaddq_u16'? [-Wimplicit-function-declaration]
|   219 |         uint32_t t1 = vaddvq_u16(sc);
|       |                       ^~~~~~~~~~
|       |                       vaddq_u16
| bitops.c:225:14: error: implicit declaration of function 'vaddvq_u8'; did you mean 'vaddq_u8'? [-Wimplicit-function-declaration]
|   225 |         t += vaddvq_u8(vcntq_u8(vld1q_u8(p)));
|       |              ^~~~~~~~~
|       |              vaddq_u8
```

More details are available in the following log:
https://errors.yoctoproject.org/Errors/Details/889836/

Signed-off-by: Leon Anavi <leon.anavi@konsulko.com>
valkey-io#2875)

When we added the Hash Field Expiration feature in Valkey 9.0, some of
the new command docs included complexity description of O(1) even tough
they except multiple arguments.
(see discussion in
valkey-io#2851 (comment))
This PR does:
1. align all the commands to the same description
2. fix the complexity description of some commands (eg HSETEX and
HGETEX)

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
…o#2874)

fedorarawhide CI reports these warnings:
```
networking.c: In function 'afterErrorReply':
networking.c:821:30: error: initialization discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers]
  821 |             char *spaceloc = memchr(s, ' ', len < 32 ? len : 32);
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
Persist USE_FAST_FLOAT and PROG_SUFFIX to prevent a complete rebuild
next time someone types make or make test without specifying variables.

Fixes valkey-io#2880

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
## Problem
IO thread shutdown can deadlock during server panic when the main thread
calls `pthread_cancel()` while the IO thread holds its mutex, preventing
the thread from observing the cancellation.

## Solution  
Release the IO thread mutex before cancelling to ensure clean thread
termination.

## Testing
Reproducer:
```
bash
./src/valkey-server --io-threads 2 --enable-debug-command yes
./src/valkey-cli debug panic
```

Before: Server hangs indefinitely
After: Server terminates cleanly

Signed-off-by: Ouri Half <ourih@amazon.com>
@zuiderkwast
Copy link
Contributor Author

@valkey-io/valkey-committers PTAL at the release notes and if there is anything else we need to include in 9.0.1.

@zuiderkwast zuiderkwast marked this pull request as ready for review December 4, 2025 18:15
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 86.66667% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.60%. Comparing base (5018b12) to head (ce25d27).
⚠️ Report is 18 commits behind head on 9.0.

Files with missing lines Patch % Lines
src/cluster_legacy.c 63.63% 4 Missing ⚠️
src/module.c 0.00% 1 Missing ⚠️
src/sentinel.c 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              9.0    #2906      +/-   ##
==========================================
- Coverage   72.66%   72.60%   -0.07%     
==========================================
  Files         128      128              
  Lines       71301    69867    -1434     
==========================================
- Hits        51812    50726    -1086     
+ Misses      19489    19141     -348     
Files with missing lines Coverage Δ
src/cli_common.c 61.81% <100.00%> (-0.24%) ⬇️
src/cluster_migrateslots.c 92.15% <100.00%> (+0.07%) ⬆️
src/commands.def 100.00% <ø> (ø)
src/io_threads.c 35.57% <100.00%> (+0.37%) ⬆️
src/networking.c 88.46% <100.00%> (-0.01%) ⬇️
src/resp_parser.c 98.47% <100.00%> (ø)
src/server.c 88.39% <100.00%> (-0.04%) ⬇️
src/t_list.c 92.95% <100.00%> (+0.06%) ⬆️
src/module.c 9.76% <0.00%> (-0.02%) ⬇️
src/sentinel.c 0.00% <0.00%> (ø)
... and 1 more

... and 95 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@enjoy-binbin
Copy link
Member

lets' include #2652

enjoy-binbin and others added 3 commits December 9, 2025 14:42
…#2652)

In valkey-io#2078, we did not report large reply when copy avoidance is allowed.
This results in replies larger than 16384 not being recorded in the
commandlog large-reply. This 16384 is controlled by the hidden config
min-string-size-avoid-copy-reply.

Signed-off-by: Binbin <binloveplay1314@qq.com>
…y-io#2915)

The CLUSTER SLOTS reply depends on whether the client is connected over
IPv6, but for a fake client there is no connection and when this command
is called from a module timer callback or other scenario where no real
client is involved, there is no connection to check IPv6 support on.
This fix handles the missing case by returning the reply for IPv4
connected clients.

Fixes valkey-io#2912.

---------

Signed-off-by: Su Ko <rhtn1128@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Su Ko <rhtn1128@gmail.com>
Co-authored-by: KarthikSubbarao <karthikrs2021@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@zuiderkwast zuiderkwast merged commit ab3c953 into valkey-io:9.0 Dec 9, 2025
66 of 71 checks passed
@zuiderkwast zuiderkwast deleted the backport-for-9-0-1 branch December 9, 2025 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.