Test CI with different controller-runtime versions#1393
Test CI with different controller-runtime versions#1393matheuscscp wants to merge 3 commits intomainfrom
Conversation
This is a test commit to verify the upgrade-fail-remediate-uninstall e2e test passes with controller-runtime v0.22.4 (before the v0.23.0 upgrade). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Testing if the upgrade-fail-remediate-uninstall e2e test fails with controller-runtime v0.23.0 due to stricter terminal error handling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root Cause Analysis:
|
|
@sbueringer @alvaroaleman Since we have just discussed this, can you please confirm if the analysis above sounds correct? |
|
Its not correct, both queues have locking that prevents hand out of an item that was already handed out and where The upstream queue handles the situation you are describing by not putting the item in the |
|
Thanks! Any insights about what could be different between the two queues that was causing this unintended side-effect on a terminal error? |
|
Or maybe something else entirely unrelated to the queues between v0.22.4 and v0.23 that could explain this apparent second reconciliation that was hiding this bug in helm-controller? |
|
I don't know enough about the helm-controller to answer your question. The terminal error flow is unrelated to the workqueue, all it does is is skip the If you could add a debug wrapper around the queue that prints |
Using v0.22.4, I assume? |
EIther, the sequence of adds shouldn't change. It doesn't only need to print Add but also AddAfter and AddRatelimited which might be easier with the PQ as it all ends up in AddWithOpts there |
You can a/b test that by using 0.23 but disabling the priority queue, its just a default. |
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Yeah, as the last commit shows, the behavior difference is in the PQ... I'm now debugging locally with your suggestion of wrapping the queue, will report if I find anything |
Debug FindingsI added a debug wrapper around the queue to log Timeline with Priority Queue (v0.23)Result
AnalysisThe This confirms the priority queue handles pending |
|
Oops, looks like I already found something ( 🤖 😅 ) |
|
yeah, PQ de-duplicates AddAfter with Add, upstream wq does not. That this is missing in the uptream wq is a bug: kubernetes/kubernetes#110642 Looking at your timeline, clearly there was a change between the |
|
Thanks very much for all the help here! |
Summary
Test commit to verify the
upgrade-fail-remediate-uninstalle2e test passes with controller-runtime v0.22.4 (before the v0.23.0 upgrade).This will help verify the theory that the test was passing due to controller-runtime v0.22.4's behavior and fails with v0.23.0.
Test plan
upgrade-fail-remediate-uninstalltest🤖 Generated with Claude Code