-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: consider relaxing need for blocking cluster versions when performing Pebble migrations #96819
Comments
I managed to find the docs here on long running migrations. The corresponding RFC is here. Given that we roll the Pebble FMV bump into an existing Cockroach cluster version that is used to enable value blocks (code), to avoid confusion, and to avoid re-arranging the existing Pebble FMVs, which could possibly break our (new) guarantees around alpha version compatibility, we can no-op and re-name the existing Pebble FMV. We can then re-introduce this as a new, higher numbered Pebble FMV, which has a direct correspondence with a new Cockroach internal cluster version that comes after the value blocks migration. This new internal cluster version will then be the one used in the migration job. The re-introduction of the Pebble FMV should be safe, as the underlying rewrite compactions are idempotent. If all the tables had previously been rewritten, running the Pebble migration job again should itself be a quick operation (i.e. we only pay the cost of scanning the table versions, and there will be no rewrite compactions scheduled as the rewrites had already occurred). The alternative here is to not ship this blocking migration in 23.1. To achieve this, we'd make the same no-op change suggested above. The downside is that this prevents us from bumping the min SSTable version to at least |
I was looking at outdated code. I think what we want is now in |
FWIW, the alternative you’ve listed above:
Is not so bad. This in fact is what the framework promotes — sequentially run long running migrations. We’ve run migrations that take in the order of minutes before (#58088), it’s fine. The latency of the version finalization is relatively unimportant given the timescales upgrades typically take (many minutes to even roll into new binaries). It doesn’t block foreground traffic or anything. Is there a more pressing need to introduce this asynchrony (= kick off a new job that runs the real migration, a job that might fail and you’d have to manage its own retries/lifecycle, plus check effects for in the later version)? |
We're talking about timescales that are potentially on the order of hours, depending on the size of the cluster, and how much "cold" data there is in lower levels of the LSM. The migration itself needs to rewrite all SSTables that have a sufficiently old version. Here's an example of a CC cluster where it took ~35 mins for old tables to be re-written (internal metrics). And this is for a cluster that has no other compaction activity (re-write run with the lowest priority). |
@irfansharif what I'm hoping for most is not to avoid blocking the version finalization, but to block 23.1 finalization from beginning until this migration is complete. Currently, we'll unnecessarily commit to 23.1 before we need to—It'd be better for the user to preserve backwards compatibility for as long as possible, only committing when we're actually introducing a change incompatible with 22.2. |
Currently, the `FormatPrePebblev1MarkedCompacted` (enum value `11`) is used as a synchronization point for the DB, ensuring that all pre-Pebblev1 tables have been rewritten. This format major version is implicitly tied to a Cockroach internal cluster version that ships in the 23.1 release. Not only is an implicit Pebble FMV upgrade confusing, it unnecessarily blocking. Rather than blocking the finalization of 23.1, the decision was made to instead alter Cockroach to ensure that all tables had been rewritten before ratcheting the Pebble FMV. Rename the existing FMV for the blocking re-write. The version was renamed to include `Unused`, along with a comment detailing the reasoning. Move the existing FMV, `FormatPrePebblev1MarkedCompacted`, to be the latest in the sequence of FMVs, in a section dedicated to 23.2 FMVs. Touches cockroachdb/cockroach#96819.
Currently, the `FormatPrePebblev1MarkedCompacted` (enum value `11`) is used as a synchronization point for the DB, ensuring that all pre-Pebblev1 tables have been rewritten. This format major version is implicitly tied to a Cockroach internal cluster version that ships in the 23.1 release. Not only is an implicit Pebble FMV upgrade confusing, it unnecessarily blocking. Rather than blocking the finalization of 23.1, the decision was made to instead alter Cockroach to ensure that all tables had been rewritten before ratcheting the Pebble FMV. Rename the existing FMV for the blocking re-write. The version was renamed to include `Unused`, along with a comment detailing the reasoning. Move the existing FMV, `FormatPrePebblev1MarkedCompacted`, to be the latest in the sequence of FMVs, in a section dedicated to 23.2 FMVs. Touches cockroachdb/cockroach#96819.
Currently, the `FormatPrePebblev1MarkedCompacted` (enum value `11`) is used as a synchronization point for the DB, ensuring that all pre-Pebblev1 tables have been rewritten. This format major version is implicitly tied to a Cockroach internal cluster version that ships in the 23.1 release. Not only is an implicit Pebble FMV upgrade confusing, it unnecessarily blocking. Rather than blocking the finalization of 23.1, the decision was made to instead alter Cockroach to ensure that all tables had been rewritten before ratcheting the Pebble FMV. Rename the existing FMV for the blocking re-write. The version was renamed to include `Unused`, along with a comment detailing the reasoning. Move the existing FMV, `FormatPrePebblev1MarkedCompacted`, to be the latest in the sequence of FMVs, in a section dedicated to 23.2 FMVs. Touches cockroachdb/cockroach#96819.
Paraphrasing a conversation I had with @jbowens:
This is something we can consider for 23.2. It's probably too late in the scheme of things for 23.1. I'm curious @jbowens @irfansharif - would that be any different how we'd do it now (i.e. cluster version linked to a Pebble FMV update?). The only difference is the fence version is catch-all for other preconditions. Or would the idea be to not tie this to a Pebble FMV and instead using something like a Pebble metric to determine whether the precondition is satisfied or not. And then we have the usual cluster version / FMV ratchet in addition to this? |
I think this is right. |
I made #97936 as a proposal to investigate the blocking / predicate based approach. |
Is your feature request related to a problem? Please describe.
In instances where we need to ensure that all nodes in a cluster have Pebble stores satisfying some precondition (e.g. all "old" SSTables have been re-written; tables with split user keys have been compacted), we typically use two cluster versions. The first is used as a "mark" phase in one Cockroach version
V
. A second is then introduced in a subsequent releaseV+n
(wheren
is usually1
) that blocks waiting for all the marked items to be processed.This blocking operation in the version
V+n
is likely overly conservative. Instead of holding up the progression through subsequent cluster versions, the precondition could be incorporated into a background job. The completion of said job would then become a precondition for the finalization of the Cockroach versionV+n
.Describe the solution you'd like
Investigate whether we can use the job framework and preconditions to accomplish the same goal, rather than employing a second cluster version.
Describe alternatives you've considered
Keep things the way they are (i.e. two cluster versions), at the expense of having a potentially long running Pebble background operation preventing other Cockroach migrations from occurring.
Additional context
More context in this internal thread.
Jira issue: CRDB-24351
The text was updated successfully, but these errors were encountered: