storage: consider relaxing need for blocking cluster versions when performing Pebble migrations #96819

nicktrav · 2023-02-08T20:47:06Z

Is your feature request related to a problem? Please describe.

In instances where we need to ensure that all nodes in a cluster have Pebble stores satisfying some precondition (e.g. all "old" SSTables have been re-written; tables with split user keys have been compacted), we typically use two cluster versions. The first is used as a "mark" phase in one Cockroach version V. A second is then introduced in a subsequent release V+n (where n is usually 1) that blocks waiting for all the marked items to be processed.

This blocking operation in the version V+n is likely overly conservative. Instead of holding up the progression through subsequent cluster versions, the precondition could be incorporated into a background job. The completion of said job would then become a precondition for the finalization of the Cockroach version V+n.

Describe the solution you'd like

Investigate whether we can use the job framework and preconditions to accomplish the same goal, rather than employing a second cluster version.

Describe alternatives you've considered

Keep things the way they are (i.e. two cluster versions), at the expense of having a potentially long running Pebble background operation preventing other Cockroach migrations from occurring.

Additional context

More context in this internal thread.

Jira issue: CRDB-24351

The text was updated successfully, but these errors were encountered:

nicktrav · 2023-02-13T23:53:51Z

I managed to find the docs here on long running migrations. The corresponding RFC is here.

Given that we roll the Pebble FMV bump into an existing Cockroach cluster version that is used to enable value blocks (code), to avoid confusion, and to avoid re-arranging the existing Pebble FMVs, which could possibly break our (new) guarantees around alpha version compatibility, we can no-op and re-name the existing Pebble FMV. We can then re-introduce this as a new, higher numbered Pebble FMV, which has a direct correspondence with a new Cockroach internal cluster version that comes after the value blocks migration. This new internal cluster version will then be the one used in the migration job.

The re-introduction of the Pebble FMV should be safe, as the underlying rewrite compactions are idempotent. If all the tables had previously been rewritten, running the Pebble migration job again should itself be a quick operation (i.e. we only pay the cost of scanning the table versions, and there will be no rewrite compactions scheduled as the rewrites had already occurred).

The alternative here is to not ship this blocking migration in 23.1. To achieve this, we'd make the same no-op change suggested above. The downside is that this prevents us from bumping the min SSTable version to at least Pebblev1, which would block removing code that handles table props and split user keys.

nicktrav · 2023-02-14T00:20:04Z

I was looking at outdated code. I think what we want is now in pkg/upgrade. Specifically, I think we want a system migration.

irfansharif · 2023-02-15T07:39:25Z

FWIW, the alternative you’ve listed above:

Keep things the way they are (i.e. two cluster versions), at the expense of having a potentially long running Pebble background operation preventing other Cockroach migrations from occurring.

Is not so bad. This in fact is what the framework promotes — sequentially run long running migrations. We’ve run migrations that take in the order of minutes before (#58088), it’s fine. The latency of the version finalization is relatively unimportant given the timescales upgrades typically take (many minutes to even roll into new binaries). It doesn’t block foreground traffic or anything. Is there a more pressing need to introduce this asynchrony (= kick off a new job that runs the real migration, a job that might fail and you’d have to manage its own retries/lifecycle, plus check effects for in the later version)?

nicktrav · 2023-02-15T15:13:29Z

We’ve run migrations that take in the order of minutes before (#58088), it’s fine.

We're talking about timescales that are potentially on the order of hours, depending on the size of the cluster, and how much "cold" data there is in lower levels of the LSM. The migration itself needs to rewrite all SSTables that have a sufficiently old version. Here's an example of a CC cluster where it took ~35 mins for old tables to be re-written (internal metrics). And this is for a cluster that has no other compaction activity (re-write run with the lowest priority).

jbowens · 2023-02-27T17:38:22Z

@irfansharif what I'm hoping for most is not to avoid blocking the version finalization, but to block 23.1 finalization from beginning until this migration is complete. Currently, we'll unnecessarily commit to 23.1 before we need to—It'd be better for the user to preserve backwards compatibility for as long as possible, only committing when we're actually introducing a change incompatible with 22.2.

Currently, the `FormatPrePebblev1MarkedCompacted` (enum value `11`) is used as a synchronization point for the DB, ensuring that all pre-Pebblev1 tables have been rewritten. This format major version is implicitly tied to a Cockroach internal cluster version that ships in the 23.1 release. Not only is an implicit Pebble FMV upgrade confusing, it unnecessarily blocking. Rather than blocking the finalization of 23.1, the decision was made to instead alter Cockroach to ensure that all tables had been rewritten before ratcheting the Pebble FMV. Rename the existing FMV for the blocking re-write. The version was renamed to include `Unused`, along with a comment detailing the reasoning. Move the existing FMV, `FormatPrePebblev1MarkedCompacted`, to be the latest in the sequence of FMVs, in a section dedicated to 23.2 FMVs. Touches cockroachdb/cockroach#96819.

nicktrav · 2023-03-01T19:48:09Z

Paraphrasing a conversation I had with @jbowens:

Another option is to consider is having some kind of internal cluster version that runs first and acts as a fence, gating one or more preconditions that must be satisfied before the finalization could begin.

This is something we can consider for 23.2. It's probably too late in the scheme of things for 23.1.

I'm curious @jbowens @irfansharif - would that be any different how we'd do it now (i.e. cluster version linked to a Pebble FMV update?). The only difference is the fence version is catch-all for other preconditions.

Or would the idea be to not tie this to a Pebble FMV and instead using something like a Pebble metric to determine whether the precondition is satisfied or not. And then we have the usual cluster version / FMV ratchet in addition to this?

jbowens · 2023-03-01T20:02:53Z

Or would the idea be to not tie this to a Pebble FMV and instead using something like a Pebble metric to determine whether the precondition is satisfied or not. And then we have the usual cluster version / FMV ratchet in addition to this?

I think this is right.

nicktrav · 2023-03-02T19:45:50Z

I made #97936 as a proposal to investigate the blocking / predicate based approach.

nicktrav added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Feb 8, 2023

nicktrav mentioned this issue Feb 28, 2023

db: reorder existing format major version for pre-Pebblev1 tables cockroachdb/pebble#2373

Merged

nicktrav mentioned this issue Mar 2, 2023

clusterverison,migration: prevent version finalization until one or more predicates are satisfied #97936

Open

jbowens added this to Storage Jun 4, 2024

jbowens moved this to 24.2 candidates in Storage Jun 4, 2024

nicktrav mentioned this issue Aug 26, 2024

kv,storage: run migration to strip synthetic bit from MVCC keys #129620

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: consider relaxing need for blocking cluster versions when performing Pebble migrations #96819

storage: consider relaxing need for blocking cluster versions when performing Pebble migrations #96819

nicktrav commented Feb 8, 2023 •

edited by cockroach-jira-scripts

Loading

nicktrav commented Feb 13, 2023

nicktrav commented Feb 14, 2023

irfansharif commented Feb 15, 2023 •

edited

Loading

nicktrav commented Feb 15, 2023

jbowens commented Feb 27, 2023

nicktrav commented Mar 1, 2023

jbowens commented Mar 1, 2023 •

edited

Loading

nicktrav commented Mar 2, 2023

storage: consider relaxing need for blocking cluster versions when performing Pebble migrations #96819

storage: consider relaxing need for blocking cluster versions when performing Pebble migrations #96819

Comments

nicktrav commented Feb 8, 2023 • edited by cockroach-jira-scripts Loading

nicktrav commented Feb 13, 2023

nicktrav commented Feb 14, 2023

irfansharif commented Feb 15, 2023 • edited Loading

nicktrav commented Feb 15, 2023

jbowens commented Feb 27, 2023

nicktrav commented Mar 1, 2023

jbowens commented Mar 1, 2023 • edited Loading

nicktrav commented Mar 2, 2023

nicktrav commented Feb 8, 2023 •

edited by cockroach-jira-scripts

Loading

irfansharif commented Feb 15, 2023 •

edited

Loading

jbowens commented Mar 1, 2023 •

edited

Loading