-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design for PrivateLink Connection Status History table #22936
Design for PrivateLink Connection Status History table #22936
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
Curious if there's a way to link this up with https://github.com/MaterializeInc/database-issues/issues/6495 / https://github.com/MaterializeInc/database-issues/issues/6422, where the design here is the implementation detail of how PrivateLink connection history is fed into a more generic system collection. And maybe We'll also want to verify that our VPC Endpoint controller peridiocally resyncs to update the CR status. From a super quick read through, I'm not sure if an accepted / running connection ever gets reconciled again, so we might be missing events. |
@pH14 I also thought this, but from what I can tell the reconciliation loop should be run about once-per hour since that's the default behavior of our k8s-controller, even when the |
@benesch @jkosh44 Thanks for the feedback - I believe I've addressed your comments. One thing to note is that I 'moved' the in-memory map from the
@pH14 A more generic system collection could make sense to link those up, however my initial reaction is that it diverges from the implementation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Make sure you get storage eyes (@guswynn, @sploiselle, @bkirwi have all touched the source/sink status code in addition to @petrosagg) on this before getting too deep, but this looks really solid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need govenor, but other than that this looks good!
please message me if you have any questions about the implementation!
2. Upon inspecting all the existing `VpcEndpoint`s in our `us-east-1` cluster | ||
I noticed that they all had the exact same timestamp in the | ||
`last_transition_time` field on their `Available` condition. This seems odd | ||
so we should confirm that this field is being updated appropriately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps the controller for this checks them all at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially, though that seems a little unlikely since they should each have had their own reconciliation happen when they were created? @jubrad any ideas? You can run
kubectl get vpcendpoints -A -o custom-columns=":metadata.name,:status.conditions"
to replicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing this: lastTransitionTime:2023-09-19T00:16:54Z
... this is odd. I actually wouldn't be that surprised if everything ran through reconciliation at pretty much the same time when the environment controller came up after this code was deployed. It's also possible no privatelink connections have been created since (and are still around). However, I would expect that to have occurred right after the code had merged, which was aug 23rd not sept 19th.
Here's where this should be set.
https://github.com/MaterializeInc/cloud/pull/7174/files#diff-2bdf70ebf1d6a05b7c26a2fecdb1f31ded5cc5c683a3e8bb036ddcb13b9a772bR394
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's what I see too. I'm not going to block merging this design doc on this, but will see if I can add validation for the last_transition_time
field in test_privatelink
in the cloud repo to make sure it's being set. Hopefully it's just something about the deployment time that caused all the us-east-1 objects to get the same timestamp
eba7b84
to
ccc67bf
Compare
…on, and rate-limiting of events
ccc67bf
to
d7967e2
Compare
Motivation
Tips for reviewer
Rendered Version
Some code links for helpful context:
VpcEndpointStatus
definition: https://github.com/MaterializeInc/materialize/blob/main/src/cloud-resources/src/crd/vpc_endpoint.rs#L50CloudResourceController
: https://github.com/MaterializeInc/materialize/blob/main/src/orchestrator-kubernetes/src/cloud_resource_controller.rsdrain_statement_log
method used as a model for how to flush events to storage on an interval: https://github.com/MaterializeInc/materialize/blob/main/src/adapter/src/coord/statement_logging.rs#L122Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.