🐛 Don't block on Get when queue is shutdown (2nd try)#3337
🐛 Don't block on Get when queue is shutdown (2nd try)#3337k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
Conversation
| item := <-w.get | ||
|
|
||
| return item.Key, item.Priority, w.shutdown.Load() | ||
| select { |
There was a problem hiding this comment.
Moved from #3332 (comment)
This was way to tricky to debug :)
Once PQ was enabled per default, this test started to fail:
- manager stop failed with "failed waiting for all runnable to end within grace period"
- because not all workers of the leader election runnable group stopped
- and they didn't stop because they were stuck in GetWithPriority()
Just tricky to figure out:
- which test even caused this (I just got an apiserver stop timeout, the test itself did not fail)
- which runnable group did not stop
- which runnable did not stop
- why does it not stop (is reconcile blocking or something else)
I guess our logging could use some improvements ...
There was a problem hiding this comment.
Yeah, lets please do that, might also be worth to backport this. I don't fully understand it though I think, at this point we would've given them an item anyways, what difference does it make? Definitely worth go-docing that and i think it might be worth a test as well
There was a problem hiding this comment.
Definitely worth go-docing that and i think it might be worth a test as well
Extended the godoc and added a unit test
I don't fully understand it though I think, at this point we would've given them an item anyways, what difference does it make?
We could give the caller only an item if we still have items in the queue, otherwise the caller is deadlocked.
Example:
- controller is started
- controller workers are calling
GetWithPriorityand are waiting for items - queue is empty and stays empty
- controller (and accordingly queue) is shutdown
- previous code:
- Shutdown() closes w.done
- workers remain blocked in l.293
- accordingly controller and runnable group cannot shutdown
- new code:
- Shutdown() closes w.done
- l.301 returns
There was a problem hiding this comment.
I misread the code the last time and thought this was where we send, not where we get 🤦 Makes total sense
Signed-off-by: Stefan Büringer buringerst@vmware.com
1a94201 to
5e1233d
Compare
|
LGTM label has been added. DetailsGit tree hash: 35717109c3ba177b710ffbbee40ef27626ceb374 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherrypick release-0.22 |
|
@alvaroaleman: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@alvaroaleman: new pull request created: #3338 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: Stefan Büringer buringerst@vmware.com
Part of #2374