adapter: Move cluster operations into the catalog implications#35581
adapter: Move cluster operations into the catalog implications#35581ggevay wants to merge 4 commits intoMaterializeInc:mainfrom
Conversation
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
9da1400 to
b4febd3
Compare
a6dae45 to
35c4ffc
Compare
teskje
left a comment
There was a problem hiding this comment.
Trying to understand what necessitates the controller API change. (It doesn't seem like too bad a change, but it introduces a new potential footgun, so it's not a win either.)
This is the crux of the issue:
A complication is that when the implications code processes AddCluster, the Cluster object captured by ParsedStateUpdate still has empty log_indexes, because IntrospectionSourceIndex catalog updates are applied in a later batch than the Cluster update.
By "batch" I assume to mean a batch of updates we handle together in catalog_implications. Why do these updates end up in different batches? In my mind, they are part of the same catalog transaction, so they are committed to the durable catalog with the same timestamp, so we should also read them back at the same time, so they should be in the same batch?
It seems much better for our sanity if updates that were committed to the catalog atomically would also end up in the implications at the same time. We can paper over this in the case of clusters and introspection sources, and only feel a bit bad, but we might not be able to do this in all cases.
| /// Log sources are introspection collections that are produced by | ||
| /// replicas. They must be registered before replicas are added, so | ||
| /// that replicas know which log collections to produce. |
There was a problem hiding this comment.
This should probably return an error if replicas are already registered?
Cluster replica alterations (renames, owner changes, pending flag changes) are catalog-only operations that require no controller changes. Replace the placeholder debug log with a comment explaining why no action is needed, following the same pattern used for View and ContinualTask alterations. Co-authored-by: Junie <junie@jetbrains.com>
Detect workload_class changes between prev and new cluster config and call controller.update_cluster_workload_class when they differ. All other cluster alteration side effects (replica adds/drops/renames) arrive as separate AddClusterReplica/DroppedClusterReplica/ AlterClusterReplica events. Co-authored-by: Junie <junie@jetbrains.com>
AddCluster creates the cluster on the controller. AddClusterReplica calls create_cluster_replica which creates the replica on the controller and installs introspection subscribes. Remove now-redundant post-transaction calls from sequence_create_managed_cluster, sequence_create_unmanaged_cluster, sequence_create_cluster_replica, and sequence_alter_cluster_managed_to_managed. Remove the dead create_cluster helper method. Extend ParsedStateUpdateKind::ClusterReplica with cluster_name and cluster_role fields, captured at parse time. This avoids looking up the cluster by self.catalog() when processing a replica event, which would return the final post-transaction name rather than the name at the point when the replica event was applied. A complication is that when the implications code processes AddCluster, the Cluster object captured by ParsedStateUpdate still has empty log_indexes, because IntrospectionSourceIndex catalog updates are applied after the Cluster update within the same batch. To solve this, IntrospectionSourceIndex is added as a ParsedStateUpdateKind variant and its additions are collected into a side map in apply_catalog_implications. The AddCluster handler then uses this map for the ClusterConfig's arranged_logs instead of the empty cluster.log_indexes. Co-authored-by: Junie <junie@jetbrains.com>
35c4ffc to
bcf11c8
Compare
|
Ah, I was thinking about the batches created by So, now I've implemented a different approach: We re-assemble the cluster addition with the corresponding introspection source index additions in |
teskje
left a comment
There was a problem hiding this comment.
LGTM.
One thing I didn't quite understand is why the early parsing of cluster_name and cluster_role is necessary. The commit message says:
This avoids looking up the cluster by self.catalog() when processing a replica event, which would return the final post-transaction name rather than the name at the point when the replica event was applied.
I'm not sure why this matters and why the post-transaction name isn't what we want. For example, if a cluster was renamed, it seems like we would want to register the replica with the controller under the new name, instead of the old one. Maybe an example would be helpful!
| cluster_name, | ||
| cluster_role, | ||
| ))) => { | ||
| let replica_name = format!("{}.{}", cluster_name, replica.name); |
There was a problem hiding this comment.
This was surprising to me, though it matches the previous behavior, so I guess better to keep like this. Just noting that this is a potential footgun as now we have different meanings of "replica name" in the code, and code might expect one and get the other.
|
Hmm, yes, putting the But accessing I'm now thinking about whether it would be possible to make the implications code track in a local variable the Or do you think that we should just keep it simple for now and access |
When a cluster rename and replica add happen in the same catalog transaction, the cluster_name/cluster_role captured at parse time in ParsedStateUpdate may be stale. Fix this by populating a local cluster_info map during the cluster commands loop and using it to override the parsed values in the replica commands loop. This avoids both the denormalization issue (cluster info only in ParsedStateUpdate) and the self.catalog() coupling (which could return a too-new state if implications processing is ever deferred). Co-authored-by: Junie <junie@jetbrains.com>
|
Ok, I've implemented (I mean, my AI agent implemented, with my close guidance :D) the local variable tracking approach, where the locally tracked values can override what we get from What do you think? (Also note that we still have several |
I was naively thinking this is an assumption we can make. It might not be true though, I don't think I have enough of an understanding of how things will flow in a multi-envd world to say! However, I would say since (a) it might be true and (b) we currently are looking at |
This is on top of #35571
I suggest reviewing commit-by-commit: It starts with the commits from #35571, then come two easy commits, and then the final commit handles
AddClusterandAddClusterReplica, which came out a bit more complicated. See commit msg for details.Btw. a somewhat brittle thing here is the order of applying operations in
apply_catalog_implications_inner. This is one of the reasons why some operations are not inside the initial for loops that processCatalogImplications, but are just collected in these for loops and then applied in post-processing in the second half of the function. For example, cluster drops are in post-processing, so that we can clean up pending peeks as a separate pass. I'm thinking that a few PRs later, when we have a complete picture of all the things that need to happen inapply_catalog_implications_inner, I'll try to do a cleanup PR that simplifies/untangles this function if possible, e.g., by not having 4 separate for loops processing variousCatalogImplications, and doing less stuff as separate post-processing steps.Nightly: