Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv,storage: run migration to strip synthetic bit from MVCC keys #129620

Open
nvanbenschoten opened this issue Aug 25, 2024 · 4 comments
Open

kv,storage: run migration to strip synthetic bit from MVCC keys #129620

nvanbenschoten opened this issue Aug 25, 2024 · 4 comments
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-storage Relating to our storage engine (Pebble) on-disk storage. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. O-postmortem Originated from a Postmortem action item. P-2 Issues/test failures with a fix SLA of 3 months T-storage Storage Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Aug 25, 2024

In #116830, we removed synthetic MVCC timestamps. To make this possible, we first landed #105523, which stopped encoding and decoding the synthetic timestamp bit in/from MVCC keys. This limited the reach of previously stored synthetic timestamps in long-lived clusters to just the MVCC decoding layer. We paired this with #117304, which avoided mixed-version cluster concerns during replica consistency checks (along with a number of other PRs listed in #101938).

However, we never ran a migration to strip the synthetic bit from persistent MVCC keys that have the bit set. We should, as keeping around these keys presents some risk. This will allow us to get rid of the key decoding logic which deals with the synthetic MVCC key suffix encoding. It will also allow us to get rid of the MVCC key suffix normalization logic in the key comparator and equality functions, which are needed to ensure that read+decode+encode+delete logic works correctly with synthetic timestamps.

Because of the previous work in this area, such a migration can be local to a single replica — it doesn't need to be replicated across all replicas in a range. This simplifies things meaningfully.

Jira issue: CRDB-41621

@nvanbenschoten nvanbenschoten added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-kv Anything in KV that doesn't belong in a more specific category. T-kv KV Team labels Aug 25, 2024
@RaduBerinde
Copy link
Member

RaduBerinde commented Aug 26, 2024

@jbowens mentioned that the new sstable format he is working on will not store the synthetic bit. This hasn't been decided but it is possible that we will turn it on by default in 24.3 and perhaps require all sstables to be in that format before upgrading to 24.1 25.1.

On top of that, we can add a NormalizeSuffix function in the comparer that we apply on suffixes in all range keys that we write out during flushes and compactions (and perhaps on all keys in the WAL?). This would guarantee that when all sstables are in the 24.3 format, all suffixes would be normalized.

@nicktrav
Copy link
Collaborator

Linking this to #96819 as we'll likely want to think through the implications of these migrations on version upgrades.

@github-project-automation github-project-automation bot moved this to Incoming in KV Aug 28, 2024
@jbowens jbowens added A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Sep 18, 2024
@github-project-automation github-project-automation bot moved this to Incoming in Storage Sep 18, 2024
@exalate-issue-sync exalate-issue-sync bot removed the T-kv KV Team label Sep 18, 2024
@jbowens jbowens changed the title kv: run migration to strip synthetic bit from MVCC keys kv,storage: run migration to strip synthetic bit from MVCC keys Sep 18, 2024
@jbowens
Copy link
Collaborator

jbowens commented Sep 18, 2024

Because of the previous work in this area, such a migration can be local to a single replica — it doesn't need to be replicated across all replicas in a range. This simplifies things meaningfully.

Recording for posterity, that this is no longer true for range keys (MVCC range tombstones). Due to a bug in the Comparer, MVCC range tombstones with and without the synthetic bit are not logically equivalent. We attempted to correct the Comparer, but that predictably caused a replica divergence: #130533 A migration to remove the synthetic bit from MVCC range tombstones will need to be performed above Raft. We can continue with the plan to migrate point keys with the synthetic bit below Raft.

Related internal doc: https://docs.google.com/document/d/1nKgIG-3_7xv7YsgbSmtAfI6-1riokBn05hY1J1x_ikw/edit?usp=sharing

@Schtick Schtick added the O-postmortem Originated from a Postmortem action item. label Sep 18, 2024
@Schtick
Copy link
Collaborator

Schtick commented Sep 18, 2024

@exalate-issue-sync exalate-issue-sync bot added the P-2 Issues/test failures with a fix SLA of 3 months label Sep 18, 2024
@nicktrav nicktrav moved this from Incoming to Backlog in Storage Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-storage Relating to our storage engine (Pebble) on-disk storage. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. O-postmortem Originated from a Postmortem action item. P-2 Issues/test failures with a fix SLA of 3 months T-storage Storage Team
Projects
Status: Backlog
Status: Incoming
Development

No branches or pull requests

5 participants