Allow partial sync after loading AOF with preamble#2366
Allow partial sync after loading AOF with preamble#2366zuiderkwast merged 11 commits intovalkey-io:unstablefrom
Conversation
e594366 to
462e1f5
Compare
|
|
thx for your interest. The main purpose of this PR is to clarify why we missed this in preamble-aof, and make sure adding this will not break the consistency of data and replication. I will try add more test cases. |
@secwall I still can not get an error after a weekend of testing. Can you give some hints on reproducing? |
|
Here is how I get an error: It fails on failover loop test: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2366 +/- ##
============================================
- Coverage 72.64% 72.37% -0.27%
============================================
Files 128 128
Lines 71278 70284 -994
============================================
- Hits 51777 50866 -911
+ Misses 19501 19418 -83
🚀 New features to boost your workflow:
|
|
@arthurkiller can you also signoff the commits? |
a52ec12 to
98ad37b
Compare
@ranshid done @secwall I've fixed this. Cause repl_buffer is created when we get a replica attached. We'd create this on data load. I'll add a tcl case for this later. Not so sure if we use this replid from the previous RDB, will it make any side effects |
Currently, our cluster cases can not cover this situation I still failed get this by run cluster test |
|
Great finding @arthurkiller! One of the main points of AOF is to be able to avoid full sync after a restart, so it's pretty sad that it doesn't work and isn't tested. Maybe this is why our documentation about prsistence says this:
It's too late to get this in 9.0 but I hope we can have it ready for 9.1. Will you add a test of the scenario (AOF rewrite, restart, psync)? |
@zuiderkwast Thank you for your attention. I almost forgot about this Pull Request :) . I'll try to add more comprehensive tests as soon as possible. Hectic these days. |
|
When you have time, will you add these two in some file under
We should use an AOF file with preamble and multiple AOF commands after the preamble to check that the offset is calculated correctly. |
3e9a4fd to
bf34cd1
Compare
b1d8058 to
1644ee2
Compare
Signed-off-by: arthur.lee <liziang.arthur@bytedance.com>
Signed-off-by: arthur.lee <liziang.arthur@bytedance.com>
Signed-off-by: arthur.lee <liziang.arthur@bytedance.com>
Signed-off-by: arthur.lee <liziang.arthur@bytedance.com>
…eInfo Signed-off-by: arthur.lee <liziang.arthur@bytedance.com>
9a31911 to
abbc6a2
Compare
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Arthur Lee <arthurkiller@users.noreply.github.com> Signed-off-by: arthur.lee <liziang.arthur@bytedance.com>
6ff5646 to
4842aa7
Compare
zuiderkwast
left a comment
There was a problem hiding this comment.
Very good, thank you for your patience!
I have only very minor comments about comments. (I can apply them before merging, if you don't.)
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Arthur Lee <arthurkiller@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Arthur Lee <arthurkiller@users.noreply.github.com>
|
Thanks a lot :) |
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance. However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations: * On a primary, it prevents the primary from accepting PSYNC continue requests after restarting with a preamble-enabled AOF file. * On a replica, it prevents the replica from successfully performing partial sync requests (avoiding full sync) after restarting with a preamble-enabled AOF file. To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file. Resolves valkey-io#2677 --------- Signed-off-by: arthur.lee <liziang.arthur@bytedance.com> Signed-off-by: Arthur Lee <arthurkiller@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance. However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations: * On a primary, it prevents the primary from accepting PSYNC continue requests after restarting with a preamble-enabled AOF file. * On a replica, it prevents the replica from successfully performing partial sync requests (avoiding full sync) after restarting with a preamble-enabled AOF file. To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file. Resolves valkey-io#2677 --------- Signed-off-by: arthur.lee <liziang.arthur@bytedance.com> Signed-off-by: Arthur Lee <arthurkiller@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance. However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations: * On a primary, it prevents the primary from accepting PSYNC continue requests after restarting with a preamble-enabled AOF file. * On a replica, it prevents the replica from successfully performing partial sync requests (avoiding full sync) after restarting with a preamble-enabled AOF file. To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file. Resolves valkey-io#2677 --------- Signed-off-by: arthur.lee <liziang.arthur@bytedance.com> Signed-off-by: Arthur Lee <arthurkiller@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance. However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations: * On a primary, it prevents the primary from accepting PSYNC continue requests after restarting with a preamble-enabled AOF file. * On a replica, it prevents the replica from successfully performing partial sync requests (avoiding full sync) after restarting with a preamble-enabled AOF file. To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file. Resolves valkey-io#2677 --------- Signed-off-by: arthur.lee <liziang.arthur@bytedance.com> Signed-off-by: Arthur Lee <arthurkiller@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance. However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations: * On a primary, it prevents the primary from accepting PSYNC continue requests after restarting with a preamble-enabled AOF file. * On a replica, it prevents the replica from successfully performing partial sync requests (avoiding full sync) after restarting with a preamble-enabled AOF file. To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file. Resolves valkey-io#2677 --------- Signed-off-by: arthur.lee <liziang.arthur@bytedance.com> Signed-off-by: Arthur Lee <arthurkiller@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
|
@zuiderkwast I think this patch did not work correctly for the replica. If the master has some non-idempotent command such as |
|
One possible approach is replica should try to connect to the master while loading the data, only load the rdb data and try to do psync-continue with master if the master is alive. Otherwise, we should load the whole data. |
…2925) This reverts commit 2da21d9. The implementation in #2366 made it possible to perform a partial resynchronization after loading an AOF file, by storing the replication offset in the AOF preamble and counting the bytes in the AOF command stream in the same way as we count byte offset in a replication stream. However, this approach isn't safe because some commands are replicated but not stored in the AOF file. This includes the commands REPLCONF, PING, PUBLISH and module-implemented commands where a module can control to propagate a command to the replication stream only, to AOF only or to both. This oversight led to data inconsistency, where the wrong replication offset is used for partial resynchronization as explained in issue #2904. The revert caused small merge conflicts with 3c3a196 which are solved. Fixes #2904. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
|
removing the release-notes label for now to avoid accidentally adding this to the release note as this was currently reverted in #2925 |
…#2366)" (valkey-io#2925) This reverts commit 2da21d9. The implementation in valkey-io#2366 made it possible to perform a partial resynchronization after loading an AOF file, by storing the replication offset in the AOF preamble and counting the bytes in the AOF command stream in the same way as we count byte offset in a replication stream. However, this approach isn't safe because some commands are replicated but not stored in the AOF file. This includes the commands REPLCONF, PING, PUBLISH and module-implemented commands where a module can control to propagate a command to the replication stream only, to AOF only or to both. This oversight led to data inconsistency, where the wrong replication offset is used for partial resynchronization as explained in issue valkey-io#2904. The revert caused small merge conflicts with 3c3a196 which are solved. Fixes valkey-io#2904. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
…#2366)" (valkey-io#2925) This reverts commit 2da21d9. The implementation in valkey-io#2366 made it possible to perform a partial resynchronization after loading an AOF file, by storing the replication offset in the AOF preamble and counting the bytes in the AOF command stream in the same way as we count byte offset in a replication stream. However, this approach isn't safe because some commands are replicated but not stored in the AOF file. This includes the commands REPLCONF, PING, PUBLISH and module-implemented commands where a module can control to propagate a command to the replication stream only, to AOF only or to both. This oversight led to data inconsistency, where the wrong replication offset is used for partial resynchronization as explained in issue valkey-io#2904. The revert caused small merge conflicts with 3c3a196 which are solved. Fixes valkey-io#2904. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Noted this was reverted in #2925.
=====================
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance.
However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations:
To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file.
Resolves #2677