Describe the bug
After we fixed the compactor issue with #2278, we still observed multiple instances where deleted data reappeared. Notably, all affected keys had large values stored in the value log (vlog).
After analyzing the code, we discovered a highly suspicious race condition. Using key "foo" at version 100 as an example, the timeline is as follows:
- vlog GC thread A reads "foo" v100 and confirms it exists.
- Thread B deletes "foo" → v101 (delete tombstone).
- Compaction runs: both v101 (delete) and v100 (value) are discarded.
- GC thread A writes back "foo" v100.
- The deleted data reappears!
Looks like step 3 above can happen at any point in time. As long as v101 (delete) and v100 (value) are compacted away independently, the rewritten "foo" v100 will resurface as if it was never deleted.
To Reproduce
The issue cannot be reliably reproduced. It occurs after continuous insert, update, and delete operations on the data — sometimes after a few days, sometimes after a few weeks.
Expected behavior
Deleted item should not be retrieved.
Screenshots
Environment
Additional context
After disabling the value log (vlog) by setting the value threshold to a very large value, this issue no longer occurs.
Describe the bug
After we fixed the compactor issue with #2278, we still observed multiple instances where deleted data reappeared. Notably, all affected keys had large values stored in the value log (vlog).
After analyzing the code, we discovered a highly suspicious race condition. Using key "foo" at version 100 as an example, the timeline is as follows:
Looks like step 3 above can happen at any point in time. As long as v101 (delete) and v100 (value) are compacted away independently, the rewritten "foo" v100 will resurface as if it was never deleted.
To Reproduce
The issue cannot be reliably reproduced. It occurs after continuous insert, update, and delete operations on the data — sometimes after a few days, sometimes after a few weeks.
Expected behavior
Deleted item should not be retrieved.
Screenshots
Environment
Additional context
After disabling the value log (vlog) by setting the value threshold to a very large value, this issue no longer occurs.