-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controller uncordons node that it did not cordon #341
Comments
If you drain/cordon a node using the SUC, it will uncordon it at the end of the upgrade. It does not have a way to determine if it was the one that cordoned it or not. If you don't want this, don't use the drain/cordon options on your plan, and cordon/uncordon your nodes some other way - either manually, or using a custom plan running a manual command.
The plan is executed on all nodes regardless of version. The SUC doesn't know if a node actually needs upgrading or not, it just knows if it has the plan hash annotation or not. If the node doesn't need upgrading, then the job with be a no-op and will just run successfully without replacing the binary. |
Am I understanding that this controller will uncordon any node in the cluster, at random, and that is by design? Edit:
This is in reference to an external cordon, in place prior to the SUC executing at all. |
No, not at random. It will cordon/uncordon it when the plan job executes on the node. The plan will be executed on the node if it lacks a plan hash label that matches the current plan. This is covered in the readme.
Maybe I can say this differently. If you use the drain/cordon option in your plan, the node will be drained/cordoned when the upgrade job runs against that node (even if it is already cordoned), and it will be uncordoned at the end of the upgrade job. The ONLY thing the SUC knows about the node is whether or not it has an up-to-date plan hash label, it does NOT know if it actually needs to be upgraded or not. |
That is where I think the bug is. It should not operate on nodes that are cordoned unless the administrator deliberately configures that. (Or invert that, for backwards compatibility: There should be a flag to obey an existing cordon.) For example, when Kured reboots a node it annotates and cordons it. Unless I am missing something, that is the best-practices way of preventing other tasks from interfering. It is unexpected at best to ignore the cordon and do something that also requires the safety of a cordon. (And I maintain that an application that meddles with external cordons is not behaving appropriately.) |
There is no concept of an "external" or "internal" cordon. There is no record of what controller "owns" the cordoning. The job just knows if the plan has cordon or drain set, and if so takes the appropriate actions to drain+cordon before and uncordon after. You don't HAVE to use these plan options, if you don't like how it works then don't use it. |
This isn't about "like". When an administrator enables cordoning, that is explicitly telling the SUC that cordoning is important in the environment. There is a step where the SUC cordons a node. If that node is already cordoned prior to that, the SUC should not proceed. (With the usual "unless overridden blah blah".) If the node is already cordoned when SUC goes to apply a cordon, then that either an external actor (anything that is not SUC) cordoned it or something went wrong during the SUC activity. In neither case is blindly proceeding ahead a good answer. |
I don't agree. If you've ever looked at the k3s or rke2 upgrade job or followed its execution, you'll notice that it frequently runs twice:
If you want to consider an already-cordoned node an error during the cordon step executed by the initContainer, you can set
|
A bug in the design is still a bug.
It is perfectly valid to do lots of harmful, strange or just ineffective things. In many cases, "making sure the door is locked" does not cause harm. Altering the cluster control software is not many cases. (I am curious about "by default" though. Is there a case where that API call fails?)
If the API threw an error, this bug wouldn't exist to begin with, so I'm not sure what bearing kubectl has on it. Since you brought up other tools, kured comes straight from CNCF and uses proper locking around cordons and behaves cautiously in the face of the unexpected. And it is arguably less impactful than SUC since it only does a subset of the impactful activities. In the case that multiple operators are attempting to cordon a node and make dangerous changes, one of which is SUC, how should they behave? How are other operators expected to code around this tool? |
@disconn3ct |
What are you talking about? What API would throw an error? I showed you that we use kubectl to do the drain/cordon, thats why it has bearing on this discussion.
What exactly is "locking" around cordons? There is a single field for cordoning (Node.Spec.Unschedulable) and no standard way to indicate why it is unschedulable in a way that other controllers or CLI tools will respect. How do you expect us to "lock" this field exactly? |
I am going to leave this closed pending resolution of the discussion at kubernetes/enhancements#4212 - ref: https://groups.google.com/g/kubernetes-sig-architecture/c/Tb_3oDMAHrg/m/tNHNooWDAAAJ
If that ever sees any progress we can reevaluate this. @disconn3ct in advance of there being an actual standard or API for this, if you can point me at some non-project-specific method that kured is using to "lock" the cordoned status, we can see if it is usable for this controller. |
To start, I'm not going to continue responding to "we did it this way already". The bug is in the design. When an administrator sets the SUC cordon flag, they are explicitly stating that cordoning is important in the environment. Most controllers will behave politely and obey the cordon. If cordons were unimportant, the administrator would not have turned on the flag to begin with. If something has already cordoned the node, it is reserved. There doesn't need to be anything else. "I went to put the lockout tag up but it was already locked. I know nothing about who locked it so I am not continuing." Throw a failed job or something if you need to brute-force your way to the administrator's attention. Kured sets an annotation that it then reads to determine whether the cordon is owned by kured (and other checks not related, such as pod presence and "lock" annotation expiration.) Please explain how you expect other "dangerous" controllers to work when SUC ignores existing cordons. How do those controllers prevent SUC from yanking the rug out during their operations? Kured avoids SUC by obeying the cordon. (It can also look for specific pods, alerts, etc but that is beyond this scope.) After a reboot, step 2 above, Kured checks it's annotations before undoing a cordon. 1 Node is selected for activity The only race between well-behaved apps is between steps 2 and 3. Multiple apps could think they got the cordon if they were aligned. That is an improvement on the existing design, where the entire operation is at risk of SUC activating in the middle and downing the node. Have you glanced at the kured docs? The intro describes the basic flow:
(FYI if you want to continue being aggressive about the word "lock" please take it up with CNCF. I used their usage. They refer to both a kured-controller concurrency lock and the kured-specific annotations as locks.) |
the 2/3 race can be dealt with, with an atomic update. Grab the resource version at 2, then when patching at 3, include it. If 3 succeeds, there was no intermediate change. If it fails, someone else did it and you should restart. Also, 3 and 4 probably should be done in the same commit. 5 should also include drain. |
Version
rancher/system-upgrade-controller:v0.14.2@sha256:3cdbfdd90f814702cefb832fc4bdb09ea93865a4d06c6bafd019d1dc6a9f34c9
Platform/Architecture
Linux Arm64
Describe the bug
When a node is externally cordoned, the upgrade controller will uncordon it after completing the upgrade. This happens even if there is no upgrade, such as on a brand new node.
To Reproduce
Create a new node and cordon it manually. Once the update job runs, the node will be uncordoned even if nothing is changed.
Expected behavior
The controller should only remove cordons that it created.
Actual behavior
The controller uncordons anything that passes the upgrade job.
Additional context
The text was updated successfully, but these errors were encountered: