Conversation
…s#14274) Fix redis#14267 This bug was introduced by redis#13495 ### Summary When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process. If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted. The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation. Code Path: ``` replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked() ``` ### Solution This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…fter reload (redis#14276) This bug was introduced by redis#14130 found by @oranagra ### Summary Because `s->cgroup_ref` is created at runtime the first time a consumer group is linked with a message, but it is not released when all references are removed. However, after `debug reload` or restart, if the PEL is empty (meaning no consumer group is referencing any message), `s->cgroup_ref` will not be recreated. As a result, when executing XADD or XTRIM with `ACKED` option and checking whether a message that is being read but has not been ACKed can be deleted, the cgroup_ref being NULL will cause a crash. ### Code Path ``` xaddCommand -> streamTrim -> streamEntryIsReferenced ``` ### Solution Check if `s->cgroup_ref` is NULL in streamEntryIsReferenced().
WalkthroughThis patch release (8.2.1) addresses critical bugs: prevents active defragmentation during replication flush to avoid crashes, fixes potential crashes with XADD/ACKED after DEBUG RELOAD, and includes version bumps with comprehensive test coverage for the fixes. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/replication.c`:
- Around line 1953-1962: The saved variable orig_active_defrag is meant to
restore the previous state but is currently unused and the code always sets
server.active_defrag_enabled = 1; instead of restoring the original value;
replace the final assignment with server.active_defrag_enabled =
orig_active_defrag so the saved value is used (symbols: orig_active_defrag,
server.active_defrag_enabled) and leave the surrounding block that disables
defrag around emptyData(..., replicationEmptyDbCallback) as-is.
In `@tests/unit/memefficiency.tcl`:
- Around line 70-76: The condition in proc discard_replies_every is inverted:
change the if test from checking "$count % $frequency != 0" to "$count %
$frequency == 0" so the discard loop runs only when count is a multiple of
frequency; keep the inner for-loop ($rd read) as-is to discard discard_num
replies when that multiple occurs, referencing the proc name
discard_replies_every and the variables count and frequency to locate the
change.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
00-RELEASENOTESsrc/replication.csrc/t_stream.csrc/version.htests/unit/memefficiency.tcltests/unit/type/stream.tcl
🧰 Additional context used
🧬 Code graph analysis (1)
src/replication.c (1)
src/db.c (1)
emptyData(862-900)
🪛 GitHub Actions: CI
src/replication.c
[error] 1956-1956: replication.c:1956:9: error: unused variable 'orig_active_defrag' [-Werror=unused-variable]
🪛 GitHub Actions: External Server Tests
src/replication.c
[error] 1956-1956: Unused variable 'orig_active_defrag' in rdbLoadEmptyDbFunc; -Werror treats this as an error. Remove or use the variable.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Analyze (cpp)
- GitHub Check: code-coverage
- GitHub Check: code-coverage
🔇 Additional comments (4)
src/t_stream.c (1)
2707-2711: Conservative NULL-guard for cgroups_ref looks correct.
Avoids a crash path when the reference index is absent.src/version.h (1)
1-3: Version bump is consistent and clear.tests/unit/memefficiency.tcl (2)
350-350: Usage affected by the helper bug.This call will malfunction due to the inverted condition in
discard_replies_every. Once the helper is fixed, this usage should work correctly.
945-1007: Good regression test for issue#14267.The test appropriately validates the fix by:
- Creating fragmentation conditions that would trigger defrag
- Enabling defrag then immediately starting replication (which flushes the DB)
- Verifying the system doesn't crash and sync completes
Note: The
discard_replies_everycalls on lines 967 and 977 depend on fixing the inverted condition bug in the helper procedure.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| /* Temporarily disable active defragmentation during database flush. | ||
| * This prevents defrag from being triggered in replicationEmptyDbCallback() | ||
| * which could modify the database while it's being emptied. */ | ||
| int orig_active_defrag = server.active_defrag_enabled; | ||
| server.active_defrag_enabled = 0; | ||
|
|
||
| emptyData(-1, empty_db_flags, replicationEmptyDbCallback); | ||
|
|
||
| /* Restore the original active defragmentation setting. */ | ||
| server.active_defrag_enabled = 1; |
There was a problem hiding this comment.
Restore active_defrag_enabled to the original value (and fix the build).
Line 1956 currently triggers a -Werror unused-variable failure, and Line 1962 always enables defrag even if it was disabled before the flush. Use the saved value instead.
🛠️ Proposed fix
/* Temporarily disable active defragmentation during database flush.
* This prevents defrag from being triggered in replicationEmptyDbCallback()
* which could modify the database while it's being emptied. */
int orig_active_defrag = server.active_defrag_enabled;
server.active_defrag_enabled = 0;
emptyData(-1, empty_db_flags, replicationEmptyDbCallback);
/* Restore the original active defragmentation setting. */
- server.active_defrag_enabled = 1;
+ server.active_defrag_enabled = orig_active_defrag;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /* Temporarily disable active defragmentation during database flush. | |
| * This prevents defrag from being triggered in replicationEmptyDbCallback() | |
| * which could modify the database while it's being emptied. */ | |
| int orig_active_defrag = server.active_defrag_enabled; | |
| server.active_defrag_enabled = 0; | |
| emptyData(-1, empty_db_flags, replicationEmptyDbCallback); | |
| /* Restore the original active defragmentation setting. */ | |
| server.active_defrag_enabled = 1; | |
| /* Temporarily disable active defragmentation during database flush. | |
| * This prevents defrag from being triggered in replicationEmptyDbCallback() | |
| * which could modify the database while it's being emptied. */ | |
| int orig_active_defrag = server.active_defrag_enabled; | |
| server.active_defrag_enabled = 0; | |
| emptyData(-1, empty_db_flags, replicationEmptyDbCallback); | |
| /* Restore the original active defragmentation setting. */ | |
| server.active_defrag_enabled = orig_active_defrag; |
🧰 Tools
🪛 GitHub Actions: CI
[error] 1956-1956: replication.c:1956:9: error: unused variable 'orig_active_defrag' [-Werror=unused-variable]
🪛 GitHub Actions: External Server Tests
[error] 1956-1956: Unused variable 'orig_active_defrag' in rdbLoadEmptyDbFunc; -Werror treats this as an error. Remove or use the variable.
🤖 Prompt for AI Agents
In `@src/replication.c` around lines 1953 - 1962, The saved variable
orig_active_defrag is meant to restore the previous state but is currently
unused and the code always sets server.active_defrag_enabled = 1; instead of
restoring the original value; replace the final assignment with
server.active_defrag_enabled = orig_active_defrag so the saved value is used
(symbols: orig_active_defrag, server.active_defrag_enabled) and leave the
surrounding block that disables defrag around emptyData(...,
replicationEmptyDbCallback) as-is.
| proc discard_replies_every {rd count frequency discard_num} { | ||
| if {$count % $frequency != 0} { | ||
| for {set k 0} {$k < $discard_num} {incr k} { | ||
| $rd read ; # Discard replies | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Inverted condition: != 0 should be == 0.
The condition $count % $frequency != 0 causes replies to be discarded on every iteration except multiples of frequency. Based on the original pattern (if {$count % 10000 == 0}), the intent is to discard replies only when count reaches a multiple of frequency.
With the current logic, on the first iteration (count=1), it attempts to read 10000 replies when only 1 command has been sent, which would block or error.
Proposed fix
proc discard_replies_every {rd count frequency discard_num} {
- if {$count % $frequency != 0} {
+ if {$count % $frequency == 0} {
for {set k 0} {$k < $discard_num} {incr k} {
$rd read ; # Discard replies
}
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| proc discard_replies_every {rd count frequency discard_num} { | |
| if {$count % $frequency != 0} { | |
| for {set k 0} {$k < $discard_num} {incr k} { | |
| $rd read ; # Discard replies | |
| } | |
| } | |
| } | |
| proc discard_replies_every {rd count frequency discard_num} { | |
| if {$count % $frequency == 0} { | |
| for {set k 0} {$k < $discard_num} {incr k} { | |
| $rd read ; # Discard replies | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@tests/unit/memefficiency.tcl` around lines 70 - 76, The condition in proc
discard_replies_every is inverted: change the if test from checking "$count %
$frequency != 0" to "$count % $frequency == 0" so the discard loop runs only
when count is a multiple of frequency; keep the inner for-loop ($rd read) as-is
to discard discard_num replies when that multiple occurs, referencing the proc
name discard_replies_every and the variables count and frequency to locate the
change.
Benchmark PR from agentic-review-benchmarks#6
Summary by CodeRabbit
Bug Fixes
Performance and resource utilization
Version
✏️ Tip: You can customize this high-level summary in your review settings.