Skip to content

Redis 8.2.1#19

Open
tomerqodo wants to merge 4 commits intocoderabbit_combined_20260121_augment_sentry_coderabbit_1_base_redis_821_pr100from
coderabbit_combined_20260121_augment_sentry_coderabbit_1_head_redis_821_pr100
Open

Redis 8.2.1#19
tomerqodo wants to merge 4 commits intocoderabbit_combined_20260121_augment_sentry_coderabbit_1_base_redis_821_pr100from
coderabbit_combined_20260121_augment_sentry_coderabbit_1_head_redis_821_pr100

Conversation

@tomerqodo
Copy link

@tomerqodo tomerqodo commented Jan 22, 2026

Benchmark PR from qodo-benchmark#100

Summary by CodeRabbit

Release 8.2.1

  • Bug Fixes
    • Improved stability during replication database synchronization to prevent unintended operations during flush processes
    • Enhanced stream consumer group reference tracking to ensure correct behavior after system reloads

✏️ Tip: You can customize this high-level summary in your review settings.

sundb and others added 4 commits January 21, 2026 15:58
…s#14274)

Fix redis#14267
This bug was introduced by redis#13495

### Summary

When a replica clears a large database, it periodically calls
processEventsWhileBlocked() in the replicationEmptyDbCallback() callback
during the key deletion process.
If defragmentation is enabled, this means that active defrag can be
triggered while the database is being deleted.
The defragmentation process may also modify the database at this time,
which could lead to crashes when the database is accessed after
defragmentation.

Code Path:
```
replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked()
```

### Solution

This PR temporarily disables active defrag before emptying the database,
then restores the active defrag setting after the empty is complete.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…fter reload (redis#14276)

This bug was introduced by redis#14130
found by @oranagra

### Summary

Because `s->cgroup_ref` is created at runtime the first time a consumer
group is linked with a message, but it is not released when all
references are removed.

However, after `debug reload` or restart, if the PEL is empty (meaning
no consumer group is referencing any message), `s->cgroup_ref` will not
be recreated.

As a result, when executing XADD or XTRIM with `ACKED` option and
checking whether a message that is being read but has not been ACKed can
be deleted, the cgroup_ref being NULL will cause a crash.

### Code Path
```
xaddCommand -> streamTrim -> streamEntryIsReferenced
```

### Solution

Check if `s->cgroup_ref` is NULL in streamEntryIsReferenced().
@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

Walkthrough

Version 8.2.1 point release with bug fixes and performance improvements. Changes include version constants update, release notes documentation, a defragmentation safety guard during master-replica synchronization, a stream entry optimization, and new test cases for defragmentation and stream reloading scenarios.

Changes

Cohort / File(s) Summary
Release & Version
00-RELEASENOTES, src/version.h
Bumped version from 8.2.0 to 8.2.1; added release notes block documenting moderate-urgency bug fixes and performance improvements
Replication Safety
src/replication.c
Added defragmentation state guard around RDB flush during replica sync; temporarily disables and restores active defragmentation to prevent triggering during database emptying
Stream Optimization
src/t_stream.c
Added early return in streamEntryIsReferenced when cgroups_ref is NULL, treating entry as referenced without further lookups
Test Coverage
tests/unit/memefficiency.tcl
Introduced discard_replies_every utility for managing reply discards; added new test scenario validating defragmentation behavior during replicaof database flush
Stream Tests
tests/unit/type/stream.tcl
Added new test case verifying XADD with ACKED option remains stable after DEBUG RELOAD; validates stream state consistency across reloads

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Version hops from 8.2 to 8.2.1 with care,
Defrag guards protect the replica's lair,
Streams skip early when refs aren't there,
Tests ensure no crashes on reload's fair,
A tidy release, beyond compare! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Redis 8.2.1' is directly related to the main purpose of the changeset, which is to release version 8.2.1. The version bump is evident in src/version.h and 00-RELEASENOTES. However, it is overly generic and does not convey meaningful detail about the substantive bug fixes included (defrag safety, NULL check for stream entries). While technically accurate, it reads more like a version label than a descriptive summary of changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/replication.c`:
- Around line 1953-1962: The code saves the original defrag flag to
orig_active_defrag but restores it incorrectly by hardcoding
server.active_defrag_enabled = 1; instead of using the saved value; change the
restore to assign server.active_defrag_enabled = orig_active_defrag so the
previous state is preserved (the change is around the block that disables defrag
before calling emptyData(...) and references server.active_defrag_enabled,
orig_active_defrag, and replicationEmptyDbCallback).

In `@tests/unit/memefficiency.tcl`:
- Around line 70-76: The helper proc discard_replies_every has its condition
inverted and therefore discards replies on every iteration; change the modulus
check from "$count % $frequency != 0" to "$count % $frequency == 0" so it only
discards when the count is a multiple of frequency, and update all other
occurrences of the same pattern (the other listed line ranges) to the same
equality check to prevent premature/blocking reads in discard_replies_every.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8578af1 and e252624.

📒 Files selected for processing (6)
  • 00-RELEASENOTES
  • src/replication.c
  • src/t_stream.c
  • src/version.h
  • tests/unit/memefficiency.tcl
  • tests/unit/type/stream.tcl
🧰 Additional context used
🧬 Code graph analysis (1)
src/replication.c (1)
src/db.c (1)
  • emptyData (862-900)
🪛 GitHub Actions: CI
src/t_stream.c

[error] 1-1: Command failed during build: make -C src all. See error about unused variable in replication.c.

src/version.h

[error] 1-1: Command failed during build: make -C src all. See error about unused variable in replication.c.

src/replication.c

[error] 1956-1956: replication.c: In function ‘rdbLoadEmptyDbFunc’: unused variable ‘orig_active_defrag’ [-Werror=unused-variable]

🪛 GitHub Actions: Codecov
tests/unit/type/stream.tcl

[error] 25-25: XADD with ACKED option doesn't crash after DEBUG RELOAD in tests/unit/type/stream.tcl. Expected 3 but got 2 (assert_equal on XLEN).

src/replication.c

[warning] 1956-1956: Unused variable 'orig_active_defrag' in replication.c. The value is assigned but never used.

🪛 GitHub Actions: External Server Tests
src/replication.c

[error] 1956-1956: In function 'rdbLoadEmptyDbFunc': unused variable 'orig_active_defrag' -Werror=unused-variable caused the build to fail. Command failed: make REDIS_CFLAGS=-Werror.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Analyze (cpp)
  • GitHub Check: code-coverage
  • GitHub Check: code-coverage
🔇 Additional comments (4)
src/t_stream.c (1)

2708-2711: The NULL check prevents crashes after DEBUG RELOAD.

The fix correctly guards against NULL pointer dereference when cgroups_ref is NULL (e.g., after DEBUG RELOAD with an empty PEL). Returning 1 (entry is referenced) is the safe/conservative choice — it prevents potential data loss at the cost of possibly not trimming entries that could technically be deleted.

This is an acceptable trade-off for a crash fix. If precise trimming behavior is needed post-reload, a follow-up could improve the logic to return 0 when the PEL is definitively empty.

00-RELEASENOTES (1)

22-40: Release notes accurately document the fixes.

The new 8.2.1 section correctly captures the bug fixes from this PR:

  • #14274 aligns with the defrag disable during replica flush fix
  • #14276 aligns with the XADD/XTRIM crash fix after RDB load

The format and structure are consistent with existing release notes.

src/version.h (1)

1-3: Version bump is correct.

The version constants are correctly updated:

  • REDIS_VERSION "8.2.1" matches the release
  • REDIS_VERSION_NUM 0x00080201 correctly encodes version 8.2.1 in the expected format

Note: The pipeline failure about an unused variable in replication.c is unrelated to this file and should be addressed separately.

tests/unit/type/stream.tcl (1)

269-272: The test expectation is correct; the code has a bug at line 2708.

At line 270, the code correctly implements DELETE_STRATEGY_ACKED semantics but encounters a blocking conservative NULL check. The test expectation of XLEN=2 is correct.

When entry 1-0 is checked for deletion:

  • Entry 1-0 >= min_cgroup_last_id (1-0 not < 1-0), so it's been delivered
  • cgroups_ref is NULL because PEL is empty after RELOAD
  • The code should recognize: entry >= min_cgroup_last_id + cgroups_ref=NULL → entry is fully acknowledged → deletable
  • However, line 2708 unconditionally returns 1 (referenced) when cgroups_ref is NULL, blocking deletion

The test comment states the expected behavior clearly: "ACKED option should work correctly even without cgroup references." This means ACKed entries should be deletable regardless of cgroups_ref state, as long as they're >= min_cgroup_last_id.

The actual problem is not with the test, but with the conservative NULL check at src/t_stream.c:2708 that should instead check whether the entry is actually within the delivered range before falling back to cgroups_ref.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +1953 to +1962
/* Temporarily disable active defragmentation during database flush.
* This prevents defrag from being triggered in replicationEmptyDbCallback()
* which could modify the database while it's being emptied. */
int orig_active_defrag = server.active_defrag_enabled;
server.active_defrag_enabled = 0;

emptyData(-1, empty_db_flags, replicationEmptyDbCallback);

/* Restore the original active defragmentation setting. */
server.active_defrag_enabled = 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Restore the original defrag setting (and fix the -Werror build break).

You save orig_active_defrag but never use it, which fails CI and also forces defrag on even if it was previously disabled. Restore the saved value instead.

🐛 Proposed fix
-    int orig_active_defrag = server.active_defrag_enabled;
-    server.active_defrag_enabled = 0;
-
-    emptyData(-1, empty_db_flags, replicationEmptyDbCallback);
-
-    /* Restore the original active defragmentation setting. */
-    server.active_defrag_enabled = 1;
+    int orig_active_defrag = server.active_defrag_enabled;
+    server.active_defrag_enabled = 0;
+
+    emptyData(-1, empty_db_flags, replicationEmptyDbCallback);
+
+    /* Restore the original active defragmentation setting. */
+    server.active_defrag_enabled = orig_active_defrag;
🧰 Tools
🪛 GitHub Actions: CI

[error] 1956-1956: replication.c: In function ‘rdbLoadEmptyDbFunc’: unused variable ‘orig_active_defrag’ [-Werror=unused-variable]

🪛 GitHub Actions: Codecov

[warning] 1956-1956: Unused variable 'orig_active_defrag' in replication.c. The value is assigned but never used.

🪛 GitHub Actions: External Server Tests

[error] 1956-1956: In function 'rdbLoadEmptyDbFunc': unused variable 'orig_active_defrag' -Werror=unused-variable caused the build to fail. Command failed: make REDIS_CFLAGS=-Werror.

🤖 Prompt for AI Agents
In `@src/replication.c` around lines 1953 - 1962, The code saves the original
defrag flag to orig_active_defrag but restores it incorrectly by hardcoding
server.active_defrag_enabled = 1; instead of using the saved value; change the
restore to assign server.active_defrag_enabled = orig_active_defrag so the
previous state is preserved (the change is around the block that disables defrag
before calling emptyData(...) and references server.active_defrag_enabled,
orig_active_defrag, and replicationEmptyDbCallback).

Comment on lines +70 to +76
proc discard_replies_every {rd count frequency discard_num} {
if {$count % $frequency != 0} {
for {set k 0} {$k < $discard_num} {incr k} {
$rd read ; # Discard replies
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

discard_replies_every discards on every iteration (condition inverted).

$count % $frequency != 0 causes the helper to read discard_num replies on almost every loop (including the first), which will block once replies are exhausted. This affects all new call sites, including the replica flush defrag test.

🐛 Proposed fix
-    if {$count % $frequency != 0} {
+    if {$frequency > 0 && ($count % $frequency) == 0} {
         for {set k 0} {$k < $discard_num} {incr k} {
             $rd read ; # Discard replies
         }
     }

Also applies to: 350-351, 362-363, 372-373, 768-769, 878-879, 888-889, 945-1004

🤖 Prompt for AI Agents
In `@tests/unit/memefficiency.tcl` around lines 70 - 76, The helper proc
discard_replies_every has its condition inverted and therefore discards replies
on every iteration; change the modulus check from "$count % $frequency != 0" to
"$count % $frequency == 0" so it only discards when the count is a multiple of
frequency, and update all other occurrences of the same pattern (the other
listed line ranges) to the same equality check to prevent premature/blocking
reads in discard_replies_every.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants