Skip to content

adapter: Move cluster operations into the catalog implications#35581

Open
ggevay wants to merge 4 commits intoMaterializeInc:mainfrom
ggevay:implications-cluster
Open

adapter: Move cluster operations into the catalog implications#35581
ggevay wants to merge 4 commits intoMaterializeInc:mainfrom
ggevay:implications-cluster

Conversation

@ggevay
Copy link
Copy Markdown
Contributor

@ggevay ggevay commented Mar 21, 2026

This is on top of #35571

I suggest reviewing commit-by-commit: It starts with the commits from #35571, then come two easy commits, and then the final commit handles AddCluster and AddClusterReplica, which came out a bit more complicated. See commit msg for details.

Btw. a somewhat brittle thing here is the order of applying operations in apply_catalog_implications_inner. This is one of the reasons why some operations are not inside the initial for loops that process CatalogImplications, but are just collected in these for loops and then applied in post-processing in the second half of the function. For example, cluster drops are in post-processing, so that we can clean up pending peeks as a separate pass. I'm thinking that a few PRs later, when we have a complete picture of all the things that need to happen in apply_catalog_implications_inner, I'll try to do a cleanup PR that simplifies/untangles this function if possible, e.g., by not having 4 separate for loops processing various CatalogImplications, and doing less stuff as separate post-processing steps.

Nightly:

@ggevay ggevay added the A-ADAPTER Topics related to the ADAPTER layer label Mar 21, 2026
@github-actions
Copy link
Copy Markdown

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@ggevay ggevay force-pushed the implications-cluster branch 6 times, most recently from 9da1400 to b4febd3 Compare March 23, 2026 13:31
@ggevay ggevay marked this pull request as ready for review March 24, 2026 08:30
@ggevay ggevay requested review from a team as code owners March 24, 2026 08:30
@ggevay ggevay requested a review from aljoscha March 24, 2026 08:30
@ggevay ggevay force-pushed the implications-cluster branch 2 times, most recently from a6dae45 to 35c4ffc Compare March 29, 2026 17:38
Copy link
Copy Markdown
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand what necessitates the controller API change. (It doesn't seem like too bad a change, but it introduces a new potential footgun, so it's not a win either.)

This is the crux of the issue:

A complication is that when the implications code processes AddCluster, the Cluster object captured by ParsedStateUpdate still has empty log_indexes, because IntrospectionSourceIndex catalog updates are applied in a later batch than the Cluster update.

By "batch" I assume to mean a batch of updates we handle together in catalog_implications. Why do these updates end up in different batches? In my mind, they are part of the same catalog transaction, so they are committed to the durable catalog with the same timestamp, so we should also read them back at the same time, so they should be in the same batch?

It seems much better for our sanity if updates that were committed to the catalog atomically would also end up in the implications at the same time. We can paper over this in the case of clusters and introspection sources, and only feel a bit bad, but we might not be able to do this in all cases.

Comment on lines +980 to +982
/// Log sources are introspection collections that are produced by
/// replicas. They must be registered before replicas are added, so
/// that replicas know which log collections to produce.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably return an error if replicas are already registered?

ggevay and others added 3 commits March 31, 2026 11:23
Cluster replica alterations (renames, owner changes, pending flag
changes) are catalog-only operations that require no controller
changes. Replace the placeholder debug log with a comment
explaining why no action is needed, following the same pattern
used for View and ContinualTask alterations.

Co-authored-by: Junie <junie@jetbrains.com>
Detect workload_class changes between prev and new cluster config
and call controller.update_cluster_workload_class when they differ.
All other cluster alteration side effects (replica
adds/drops/renames) arrive as separate
AddClusterReplica/DroppedClusterReplica/ AlterClusterReplica events.

Co-authored-by: Junie <junie@jetbrains.com>
AddCluster creates the cluster on the controller.
AddClusterReplica calls create_cluster_replica which creates
the replica on the controller and installs introspection
subscribes.

Remove now-redundant post-transaction calls from
sequence_create_managed_cluster,
sequence_create_unmanaged_cluster,
sequence_create_cluster_replica, and
sequence_alter_cluster_managed_to_managed. Remove the dead
create_cluster helper method.

Extend ParsedStateUpdateKind::ClusterReplica with
cluster_name and cluster_role fields, captured at parse time.
This avoids looking up the cluster by self.catalog() when
processing a replica event, which would return the final
post-transaction name rather than the name at the point
when the replica event was applied.

A complication is that when the implications code processes
AddCluster, the Cluster object captured by ParsedStateUpdate
still has empty log_indexes, because IntrospectionSourceIndex
catalog updates are applied after the Cluster update within
the same batch. To solve this, IntrospectionSourceIndex is
added as a ParsedStateUpdateKind variant and its additions
are collected into a side map in apply_catalog_implications.
The AddCluster handler then uses this map for the
ClusterConfig's arranged_logs instead of the empty
cluster.log_indexes.

Co-authored-by: Junie <junie@jetbrains.com>
@ggevay ggevay force-pushed the implications-cluster branch from 35c4ffc to bcf11c8 Compare March 31, 2026 11:57
@ggevay
Copy link
Copy Markdown
Contributor Author

ggevay commented Mar 31, 2026

Ah, I was thinking about the batches created by ApplyState. But looking at the code more, these batches shouldn't actually matter from the point of view of the catalog implications code, because we call the implications code only after all the batches are applied. So, we should already have the property you mentioned that "updates that were committed to the catalog atomically would also end up in the implications at the same time".

So, now I've implemented a different approach: We re-assemble the cluster addition with the corresponding introspection source index additions in apply_catalog_implications. That is, we collect the IntrospectionSourceIndex updates in apply_catalog_implications, pass this collection into apply_catalog_implications_inner, then ninja-edit the appropriate IntrospectionSourceIndexs into the ClusterConfig when processing a cluster creation. Let me know what you think.

Copy link
Copy Markdown
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

One thing I didn't quite understand is why the early parsing of cluster_name and cluster_role is necessary. The commit message says:

This avoids looking up the cluster by self.catalog() when processing a replica event, which would return the final post-transaction name rather than the name at the point when the replica event was applied.

I'm not sure why this matters and why the post-transaction name isn't what we want. For example, if a cluster was renamed, it seems like we would want to register the replica with the controller under the new name, instead of the old one. Maybe an example would be helpful!

cluster_name,
cluster_role,
))) => {
let replica_name = format!("{}.{}", cluster_name, replica.name);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was surprising to me, though it matches the previous behavior, so I guess better to keep like this. Just noting that this is a potential footgun as now we have different meanings of "replica name" in the code, and code might expect one and get the other.

@ggevay
Copy link
Copy Markdown
Contributor Author

ggevay commented Mar 31, 2026

Hmm, yes, putting the cluster_name and cluster_role into the ParsedStateUpdate is not so good: it kind of denormalizes information, and then if there is also a cluster rename in addition to a cluster creation in the same catalog transaction, then we wouldn't update the cluster_name that we added to ParsedStateUpdate.

But accessing self.catalog() doesn't seem good to me either: It would be assuming that we run the catalog implications code immediately after each catalog transaction. If this ceases to be the case at some point in the future, then we might pull a later state from self.catalog() than the one that we had at the point where the implications currently being processed happened.

I'm now thinking about whether it would be possible to make the implications code track in a local variable the cluster_name and cluster_role as it is processing cluster operations, and then supply these from the local variable when a replica operation needs them. But we'd still need cluster_name and cluster_role also in ParsedStateUpdate, because the whole catalog transaction might not have a cluster operation for the corresponding cluster, so the local tracking code in the implications logic might not find anything.

Or do you think that we should just keep it simple for now and access self.catalog()?

When a cluster rename and replica add happen in the same
catalog transaction, the cluster_name/cluster_role captured
at parse time in ParsedStateUpdate may be stale.

Fix this by populating a local cluster_info map during the
cluster commands loop and using it to override the parsed
values in the replica commands loop.

This avoids both the denormalization issue (cluster info
only in ParsedStateUpdate) and the self.catalog() coupling
(which could return a too-new state if implications
processing is ever deferred).

Co-authored-by: Junie <junie@jetbrains.com>
@ggevay
Copy link
Copy Markdown
Contributor Author

ggevay commented Mar 31, 2026

Ok, I've implemented (I mean, my AI agent implemented, with my close guidance :D) the local variable tracking approach, where the locally tracked values can override what we get from ParsedStateUpdate. It wasn't too bad, see the last, separate commit. The only not nice thing is that we can't really test it, because I think we don't have such catalog transactions at the moment that would both add a cluster and then rename it in the same transaction.

What do you think?

(Also note that we still have several self.catalog() calls in the implications code, dating back to before I took over the implications code. We can think about it later whether those are ok. I guess some might be ok, specifically the ones where the info that we are reading from self.catalog() is not a thing that can be altered after an object is created.)

@teskje
Copy link
Copy Markdown
Contributor

teskje commented Mar 31, 2026

It would be assuming that we run the catalog implications code immediately after each catalog transaction.

I was naively thinking this is an assumption we can make. It might not be true though, I don't think I have enough of an understanding of how things will flow in a multi-envd world to say!

However, I would say since (a) it might be true and (b) we currently are looking at self.catalog() in several places already, it's probably not worth complicating the code just to handle this one new case correctly. So I'd prefer to keep things simple for now. We'll either find that we can make the assumption and keep things simple forever, or that we can not make the assumption, at which point we should make a holistic refactor to fix all the places that currently rely on the assumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ADAPTER Topics related to the ADAPTER layer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants