adapter: Move cluster operations into the catalog implications by ggevay · Pull Request #35581 · MaterializeInc/materialize

ggevay · 2026-03-21T12:47:18Z

This is on top of #35571

I suggest reviewing commit-by-commit: It starts with the commits from #35571, then come two easy commits, and then the final commit handles AddCluster and AddClusterReplica, which came out a bit more complicated. See commit msg for details.

Btw. a somewhat brittle thing here is the order of applying operations in apply_catalog_implications_inner. This is one of the reasons why some operations are not inside the initial for loops that process CatalogImplications, but are just collected in these for loops and then applied in post-processing in the second half of the function. For example, cluster drops are in post-processing, so that we can clean up pending peeks as a separate pass. I'm thinking that a few PRs later, when we have a complete picture of all the things that need to happen in apply_catalog_implications_inner, I'll try to do a cleanup PR that simplifies/untangles this function if possible, e.g., by not having 4 separate for loops processing various CatalogImplications, and doing less stuff as separate post-processing steps.

Nightly:

github-actions · 2026-03-21T12:47:29Z

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

The PR title is descriptive and will make sense in the git log.
This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

teskje

Trying to understand what necessitates the controller API change. (It doesn't seem like too bad a change, but it introduces a new potential footgun, so it's not a win either.)

This is the crux of the issue:

A complication is that when the implications code processes AddCluster, the Cluster object captured by ParsedStateUpdate still has empty log_indexes, because IntrospectionSourceIndex catalog updates are applied in a later batch than the Cluster update.

By "batch" I assume to mean a batch of updates we handle together in catalog_implications. Why do these updates end up in different batches? In my mind, they are part of the same catalog transaction, so they are committed to the durable catalog with the same timestamp, so we should also read them back at the same time, so they should be in the same batch?

It seems much better for our sanity if updates that were committed to the catalog atomically would also end up in the implications at the same time. We can paper over this in the case of clusters and introspection sources, and only feel a bit bad, but we might not be able to do this in all cases.

teskje · 2026-03-31T08:50:23Z

src/compute-client/src/controller/instance.rs

+    /// Log sources are introspection collections that are produced by
+    /// replicas. They must be registered before replicas are added, so
+    /// that replicas know which log collections to produce.


This should probably return an error if replicas are already registered?

Cluster replica alterations (renames, owner changes, pending flag changes) are catalog-only operations that require no controller changes. Replace the placeholder debug log with a comment explaining why no action is needed, following the same pattern used for View and ContinualTask alterations. Co-authored-by: Junie <junie@jetbrains.com>

Detect workload_class changes between prev and new cluster config and call controller.update_cluster_workload_class when they differ. All other cluster alteration side effects (replica adds/drops/renames) arrive as separate AddClusterReplica/DroppedClusterReplica/ AlterClusterReplica events. Co-authored-by: Junie <junie@jetbrains.com>

AddCluster creates the cluster on the controller. AddClusterReplica calls create_cluster_replica which creates the replica on the controller and installs introspection subscribes. Remove now-redundant post-transaction calls from sequence_create_managed_cluster, sequence_create_unmanaged_cluster, sequence_create_cluster_replica, and sequence_alter_cluster_managed_to_managed. Remove the dead create_cluster helper method. Extend ParsedStateUpdateKind::ClusterReplica with cluster_name and cluster_role fields, captured at parse time. This avoids looking up the cluster by self.catalog() when processing a replica event, which would return the final post-transaction name rather than the name at the point when the replica event was applied. A complication is that when the implications code processes AddCluster, the Cluster object captured by ParsedStateUpdate still has empty log_indexes, because IntrospectionSourceIndex catalog updates are applied after the Cluster update within the same batch. To solve this, IntrospectionSourceIndex is added as a ParsedStateUpdateKind variant and its additions are collected into a side map in apply_catalog_implications. The AddCluster handler then uses this map for the ClusterConfig's arranged_logs instead of the empty cluster.log_indexes. Co-authored-by: Junie <junie@jetbrains.com>

ggevay · 2026-03-31T12:11:33Z

Ah, I was thinking about the batches created by ApplyState. But looking at the code more, these batches shouldn't actually matter from the point of view of the catalog implications code, because we call the implications code only after all the batches are applied. So, we should already have the property you mentioned that "updates that were committed to the catalog atomically would also end up in the implications at the same time".

So, now I've implemented a different approach: We re-assemble the cluster addition with the corresponding introspection source index additions in apply_catalog_implications. That is, we collect the IntrospectionSourceIndex updates in apply_catalog_implications, pass this collection into apply_catalog_implications_inner, then ninja-edit the appropriate IntrospectionSourceIndexs into the ClusterConfig when processing a cluster creation. Let me know what you think.

teskje

LGTM.

One thing I didn't quite understand is why the early parsing of cluster_name and cluster_role is necessary. The commit message says:

This avoids looking up the cluster by self.catalog() when processing a replica event, which would return the final post-transaction name rather than the name at the point when the replica event was applied.

I'm not sure why this matters and why the post-transaction name isn't what we want. For example, if a cluster was renamed, it seems like we would want to register the replica with the controller under the new name, instead of the old one. Maybe an example would be helpful!

teskje · 2026-03-31T13:36:48Z

src/adapter/src/coord/catalog_implications.rs

+                    cluster_name,
+                    cluster_role,
+                ))) => {
+                    let replica_name = format!("{}.{}", cluster_name, replica.name);


This was surprising to me, though it matches the previous behavior, so I guess better to keep like this. Just noting that this is a potential footgun as now we have different meanings of "replica name" in the code, and code might expect one and get the other.

ggevay · 2026-03-31T14:18:57Z

Hmm, yes, putting the cluster_name and cluster_role into the ParsedStateUpdate is not so good: it kind of denormalizes information, and then if there is also a cluster rename in addition to a cluster creation in the same catalog transaction, then we wouldn't update the cluster_name that we added to ParsedStateUpdate.

But accessing self.catalog() doesn't seem good to me either: It would be assuming that we run the catalog implications code immediately after each catalog transaction. If this ceases to be the case at some point in the future, then we might pull a later state from self.catalog() than the one that we had at the point where the implications currently being processed happened.

I'm now thinking about whether it would be possible to make the implications code track in a local variable the cluster_name and cluster_role as it is processing cluster operations, and then supply these from the local variable when a replica operation needs them. But we'd still need cluster_name and cluster_role also in ParsedStateUpdate, because the whole catalog transaction might not have a cluster operation for the corresponding cluster, so the local tracking code in the implications logic might not find anything.

Or do you think that we should just keep it simple for now and access self.catalog()?

When a cluster rename and replica add happen in the same catalog transaction, the cluster_name/cluster_role captured at parse time in ParsedStateUpdate may be stale. Fix this by populating a local cluster_info map during the cluster commands loop and using it to override the parsed values in the replica commands loop. This avoids both the denormalization issue (cluster info only in ParsedStateUpdate) and the self.catalog() coupling (which could return a too-new state if implications processing is ever deferred). Co-authored-by: Junie <junie@jetbrains.com>

ggevay · 2026-03-31T14:37:58Z

Ok, I've implemented (I mean, my AI agent implemented, with my close guidance :D) the local variable tracking approach, where the locally tracked values can override what we get from ParsedStateUpdate. It wasn't too bad, see the last, separate commit. The only not nice thing is that we can't really test it, because I think we don't have such catalog transactions at the moment that would both add a cluster and then rename it in the same transaction.

What do you think?

(Also note that we still have several self.catalog() calls in the implications code, dating back to before I took over the implications code. We can think about it later whether those are ok. I guess some might be ok, specifically the ones where the info that we are reading from self.catalog() is not a thing that can be altered after an object is created.)

teskje · 2026-03-31T15:23:07Z

It would be assuming that we run the catalog implications code immediately after each catalog transaction.

I was naively thinking this is an assumption we can make. It might not be true though, I don't think I have enough of an understanding of how things will flow in a multi-envd world to say!

However, I would say since (a) it might be true and (b) we currently are looking at self.catalog() in several places already, it's probably not worth complicating the code just to handle this one new case correctly. So I'd prefer to keep things simple for now. We'll either find that we can make the assumption and keep things simple forever, or that we can not make the assumption, at which point we should make a holistic refactor to fix all the places that currently rely on the assumption.

ggevay added the A-ADAPTER Topics related to the ADAPTER layer label Mar 21, 2026

ggevay force-pushed the implications-cluster branch 6 times, most recently from 9da1400 to b4febd3 Compare March 23, 2026 13:31

ggevay marked this pull request as ready for review March 24, 2026 08:30

ggevay requested review from a team as code owners March 24, 2026 08:30

ggevay requested a review from aljoscha March 24, 2026 08:30

ggevay mentioned this pull request Mar 24, 2026

adapter: Move ALTER SOURCE into the catalog implications #35607

Open

ggevay force-pushed the implications-cluster branch 2 times, most recently from a6dae45 to 35c4ffc Compare March 29, 2026 17:38

teskje reviewed Mar 31, 2026

View reviewed changes

ggevay and others added 3 commits March 31, 2026 11:23

ggevay force-pushed the implications-cluster branch from 35c4ffc to bcf11c8 Compare March 31, 2026 11:57

teskje approved these changes Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapter: Move cluster operations into the catalog implications#35581

adapter: Move cluster operations into the catalog implications#35581
ggevay wants to merge 4 commits intoMaterializeInc:mainfrom
ggevay:implications-cluster

ggevay commented Mar 21, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

teskje left a comment

Uh oh!

teskje Mar 31, 2026

Uh oh!

ggevay commented Mar 31, 2026

Uh oh!

teskje left a comment

Uh oh!

teskje Mar 31, 2026

Uh oh!

ggevay commented Mar 31, 2026

Uh oh!

ggevay commented Mar 31, 2026 •

edited

Loading

Uh oh!

teskje commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggevay commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 21, 2026

PR title guidelines

Pre-merge checklist

Uh oh!

teskje left a comment

Choose a reason for hiding this comment

Uh oh!

teskje Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

ggevay commented Mar 31, 2026

Uh oh!

teskje left a comment

Choose a reason for hiding this comment

Uh oh!

teskje Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

ggevay commented Mar 31, 2026

Uh oh!

ggevay commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

teskje commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggevay commented Mar 21, 2026 •

edited

Loading

ggevay commented Mar 31, 2026 •

edited

Loading