-
Notifications
You must be signed in to change notification settings - Fork 1.2k
✨ Implement warm replica support for controllers #3192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @godwinpang. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
5d1be84
to
5db61c7
Compare
/retest |
5db61c7
to
be1b1c2
Compare
/retest |
/retest |
/retest |
/retest |
e05677a
to
1d07efc
Compare
/retest |
1 similar comment
/retest |
1c32d37
to
7cc29dc
Compare
@godwinpang Pleae keep the conversations open (going forward). It's really hard to find the just resolved/fixed conversations to check that everything was addressed |
// which won't be garbage collected if we hold a reference to it. | ||
c.startWatches = nil | ||
|
||
// Mark event sources as started after resetting the startWatches slice to no-op a Watch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watch won't be a no-op. It will start the source immediately instead of storing it in startWatches
@@ -221,60 +299,71 @@ func (c *Controller[request]) Start(ctx context.Context) error { | |||
// startEventSources launches all the sources registered with this controller and waits | |||
// for them to sync. It returns an error if any of the sources fail to start or sync. | |||
func (c *Controller[request]) startEventSources(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is bigger. Does Warmup start the sources with a nil queue? (if it is called before Start)
@godwinpang Heyho, discussed this with @alvaroaleman. I think we're making really good progress on this PR and getting very close to merging it. We would just like to take a bit more time to ensure we don't break anything (I"ll also try to test it with Cluster API). So we'll go ahead and cut the CR v0.21 release now already (a few folks are waiting for that :)). Once this PR merges we'll cherry-pick and cut a v0.21.1 patch release. (I also modified the PR title as this PR is not breaking) |
A heads up that I am OOO this week so there will not be much progress on this PR. We're going to enable this warmup feature internally on June 2nd when I get back for some soak time. |
We've been running a forked version of c-r with I'll take a stab at
Anything else we are looking to do to gain more confidence around this feature and get it merged? 😄 |
Sounds fine to me |
@godwinpang any plans on wrapping this up? |
Sorry, I missed this Github notification. I've updated the PR description and also duplicated the tests between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
/hold
in case Stefan has anything to add
LGTM label has been added. Git tree hash: 781742318cfc9c5c72173ae51ebe9fb41673044d
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, godwinpang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This change implements the proposal for warm replicas as proposed in #3121.
It adds an
EnableWarmup
option for controllers to optionally start as warmed replicas, which means that sources will be started before leader election has been won by the instance.It also adds a new internal runnable interface called
warmupRunnable
with a singleWarmup
method that will be called before leader election. There is no guarantee that theWarmup
method returns before leader election, just that it is called.The controller's implementation of the
Warmup
method for thewarmupRunnable
starts the sources with a threadsafe methodstartEventSourcesAndQueueLocked
that also adds events to the queue. In the case of a non-leader elected controller, it is intended for theWarmup
method to race with theStart
method to start the sources. The methods are synchronized by thedidStartEventSourcesOnce
sync.Once used instartEventSourcesAndQueueLocked
.For the most part,
Warmup
behaves exactly the same asStart
with the following differencesWarmup
can be called, vs.Start
which can only be called once.Warmup
doesn't initialize metrics for the controller, nor does it start the worker goroutines.Shutdown
Warmup runnables are shutdown in a separate goroutine from the one that stops the leader election runnables, because there is no guarantee as to whether or not the
Warmup
or theStart
method is holding the lock for thedidStartEventSourcesOnce
sync.Once instance.Testing
On top of the unit + integration tests, this PR was tested with an internal project that ran this branch of controller-runtime for a week with (EnableWarmup=true`, and verified that the controllers on the non-leader elected replica had a populated workqueue.