Design for PrivateLink Connection Status History table #22936

rjobanp · 2023-11-03T20:26:06Z

Motivation

This PR adds a known-desirable feature. See https://github.com/MaterializeInc/database-issues/issues/5634 for context.

Tips for reviewer

Rendered Version

Some code links for helpful context:

VpcEndpointStatus definition: https://github.com/MaterializeInc/materialize/blob/main/src/cloud-resources/src/crd/vpc_endpoint.rs#L50
Existing vpc_endpoint related methods on the CloudResourceController: https://github.com/MaterializeInc/materialize/blob/main/src/orchestrator-kubernetes/src/cloud_resource_controller.rs
the drain_statement_log method used as a model for how to flush events to storage on an interval: https://github.com/MaterializeInc/materialize/blob/main/src/adapter/src/coord/statement_logging.rs#L122

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

benesch

Very nice!

doc/developer/design/20231103_privatelink_status_table.md

pH14 · 2023-11-06T15:13:01Z

Curious if there's a way to link this up with https://github.com/MaterializeInc/database-issues/issues/6495 / https://github.com/MaterializeInc/database-issues/issues/6422, where the design here is the implementation detail of how PrivateLink connection history is fed into a more generic system collection. And maybe mz_aws_privatelink_connection_status_history is then a view on top of that data that prettifies PrivateLink connection history specifically. Food for thought, haven't personally thought about this problem long enough to have a strong opinion!

We'll also want to verify that our VPC Endpoint controller peridiocally resyncs to update the CR status. From a super quick read through, I'm not sure if an accepted / running connection ever gets reconciled again, so we might be missing events.

rjobanp · 2023-11-06T15:45:44Z

We'll also want to verify that our VPC Endpoint controller peridiocally resyncs to update the CR status. From a super quick read through, I'm not sure if an accepted / running connection ever gets reconciled again, so we might be missing events.

@pH14 I also thought this, but from what I can tell the reconciliation loop should be run about once-per hour since that's the default behavior of our k8s-controller, even when the apply method returns Ok(None) and doesn't explicitly request a re-queuing:
https://github.com/MaterializeInc/k8s-controller/blob/689ddd0d7a63b892b531125598fdd3387cd212c4/src/controller.rs#L238-L248

rjobanp · 2023-11-06T18:44:52Z

@benesch @jkosh44 Thanks for the feedback - I believe I've addressed your comments. One thing to note is that I 'moved' the in-memory map from the CloudResourceController.watch_vpc_endpoints to the new task in the Adapter Coordinator since it seems cleaner to have it initialize from the Storage Controller there.

Curious if there's a way to link this up with https://github.com/MaterializeInc/database-issues/issues/6495 / https://github.com/MaterializeInc/database-issues/issues/6422, where the design here is the implementation detail of how PrivateLink connection history is fed into a more generic system collection. And maybe mz_aws_privatelink_connection_status_history is then a view on top of that data that prettifies PrivateLink connection history specifically. Food for thought, haven't personally thought about this problem long enough to have a strong opinion!

@pH14 A more generic system collection could make sense to link those up, however my initial reaction is that it diverges from the implementation of mz_sink_status_history and mz_source_status_history which currently exist as separate collections. Refactoring those to be views on top of the generic collection may not be trivial, so it could lead to some scope creep here.

jkosh44

LGTM!

benesch

Very nice! Make sure you get storage eyes (@guswynn, @sploiselle, @bkirwi have all touched the source/sink status code in addition to @petrosagg) on this before getting too deep, but this looks really solid.

doc/developer/design/20231103_privatelink_status_table.md

guswynn

I don't think we need govenor, but other than that this looks good!

please message me if you have any questions about the implementation!

doc/developer/design/20231103_privatelink_status_table.md

guswynn · 2023-11-08T00:17:29Z

doc/developer/design/20231103_privatelink_status_table.md

+2. Upon inspecting all the existing `VpcEndpoint`s in our `us-east-1` cluster
+   I noticed that they all had the exact same timestamp in the
+   `last_transition_time` field on their `Available` condition. This seems odd
+   so we should confirm that this field is being updated appropriately.


perhaps the controller for this checks them all at the same time?

Potentially, though that seems a little unlikely since they should each have had their own reconciliation happen when they were created? @jubrad any ideas? You can run

kubectl get vpcendpoints -A -o custom-columns=":metadata.name,:status.conditions"

to replicate

I'm seeing this: lastTransitionTime:2023-09-19T00:16:54Z ... this is odd. I actually wouldn't be that surprised if everything ran through reconciliation at pretty much the same time when the environment controller came up after this code was deployed. It's also possible no privatelink connections have been created since (and are still around). However, I would expect that to have occurred right after the code had merged, which was aug 23rd not sept 19th.

Here's where this should be set.
https://github.com/MaterializeInc/cloud/pull/7174/files#diff-2bdf70ebf1d6a05b7c26a2fecdb1f31ded5cc5c683a3e8bb036ddcb13b9a772bR394

Yep that's what I see too. I'm not going to block merging this design doc on this, but will see if I can add validation for the last_transition_time field in test_privatelink in the cloud repo to make sure it's being set. Hopefully it's just something about the deployment time that caused all the us-east-1 objects to get the same timestamp

doc/developer/design/20231103_privatelink_status_table.md

…on, and rate-limiting of events

Design Doc for PrivateLink Connection Status History table

bbe50a4

rjobanp requested review from benesch, petrosagg, jubrad, a team and jkosh44 November 3, 2023 20:26

benesch reviewed Nov 3, 2023

View reviewed changes

jkosh44 reviewed Nov 6, 2023

View reviewed changes

doc/developer/design/20231103_privatelink_status_table.md Outdated Show resolved Hide resolved

doc/developer/design/20231103_privatelink_status_table.md Show resolved Hide resolved

jkosh44 approved these changes Nov 6, 2023

View reviewed changes

benesch approved these changes Nov 7, 2023

View reviewed changes

doc/developer/design/20231103_privatelink_status_table.md Outdated Show resolved Hide resolved

doc/developer/design/20231103_privatelink_status_table.md Show resolved Hide resolved

guswynn approved these changes Nov 8, 2023

View reviewed changes

rjobanp force-pushed the privatelink-status-design-doc branch from eba7b84 to ccc67bf Compare November 8, 2023 19:27

Address PR feedback; add table truncation, in-memory map initializati…

d7967e2

…on, and rate-limiting of events

rjobanp force-pushed the privatelink-status-design-doc branch from ccc67bf to d7967e2 Compare November 9, 2023 15:24

rjobanp merged commit f5efe43 into MaterializeInc:main Nov 9, 2023
9 checks passed

rjobanp deleted the privatelink-status-design-doc branch February 7, 2024 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design for PrivateLink Connection Status History table #22936

Design for PrivateLink Connection Status History table #22936

rjobanp commented Nov 3, 2023

benesch left a comment

pH14 commented Nov 6, 2023

rjobanp commented Nov 6, 2023

rjobanp commented Nov 6, 2023 •

edited

Loading

jkosh44 left a comment

benesch left a comment

guswynn left a comment

guswynn Nov 8, 2023

rjobanp Nov 8, 2023

jubrad Nov 9, 2023 •

edited

Loading

rjobanp Nov 9, 2023

Design for PrivateLink Connection Status History table #22936

Design for PrivateLink Connection Status History table #22936

Conversation

rjobanp commented Nov 3, 2023

Motivation

Tips for reviewer

Checklist

benesch left a comment

Choose a reason for hiding this comment

pH14 commented Nov 6, 2023

rjobanp commented Nov 6, 2023

rjobanp commented Nov 6, 2023 • edited Loading

jkosh44 left a comment

Choose a reason for hiding this comment

benesch left a comment

Choose a reason for hiding this comment

guswynn left a comment

Choose a reason for hiding this comment

guswynn Nov 8, 2023

Choose a reason for hiding this comment

rjobanp Nov 8, 2023

Choose a reason for hiding this comment

jubrad Nov 9, 2023 • edited Loading

Choose a reason for hiding this comment

rjobanp Nov 9, 2023

Choose a reason for hiding this comment

rjobanp commented Nov 6, 2023 •

edited

Loading

jubrad Nov 9, 2023 •

edited

Loading