-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add system-upgrade to upgrade-cluster playbook #10184
Conversation
Hi @sathieu. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I've tested successfully this PR, marking ready. |
Hi @sathieu, thanks for your contribution! I am not very convinced that this should be included into kubespray though, maybe it should be inside some kind of pre-role/playbook in your tooling for instance? I am not extremely against it though, if some other reviewers think it's useful, I will not block it. That being said, if that's something that we want to include it would be nice to at least have support for debian AND debian derivatives like Ubuntu + redhat/CentOS distros. This should cover a wide amount of users... |
@MrFreezeex Thanks for your review.
We are currently using a specific playbook for this, but this leads to draining+uncordoning every node twice (one for the system upgrade, then latter for the kubespray upgrade). I have done a full test of this PR and it's not working. As rebooting cleans up
The current patch is for Debian and derivatives. I would happily add support for other distros but I need a code snippet. |
Ah yeah, I see make sense 👍
For Ubuntu support and so on you need to modify the clause so that it can run but yes. For CentOS/fedora/rocky/... You can probably use the yum module https://docs.ansible.com/ansible/latest/collections/ansible/builtin/yum_module.html#ansible-collections-ansible-builtin-yum-module there is an example in the doc there:
And you probably need another task to update the cache as well (with |
@MrFreezeex I've added For apt, this is done by: kubespray/roles/kubernetes/preinstall/tasks/0070-system-packages.yml Lines 34 to 40 in a962fa2
But the main problem remains, |
Hmmm I think you probably need to trigger the download role after the reboot unfortunately 🤔. To me you would have to move the system upgrade right after the cordon and then trigger the download role if system upgrade is invoked + make the first invocation of downlad role skipped if system upgrade is ran. |
@@ -0,0 +1,37 @@ | |||
--- | |||
# Debian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit a nitpick but I think it would a bit cleaner to create two other files debian.yml
and redhat.yml
and include them from the main.yml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/ok-to-test |
f2f9848
to
f4c9ffd
Compare
@MrFreezeex I've pushed a newer version of it.
This is not possible, because etcd and other components need download. So the behavior is not optimal, but for us this is better than cordoning nodes twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution and all the changes you made!
/lgtm
@MrFreezeex Please re-approve, I've fixed a linting problem. |
/lgtm |
@MrFreezeex Thanks for approving this PR. What is the next step to have it merged? |
You need another review/approval from a kubespray team member |
looks good for me, thanks @sathieu /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MrFreezeex, oomichi, sathieu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks @MrFreezeex and @oomichi ! |
@sathieu I tried to use the I’ll try to provide as much details as I can. Here is my directory listing (redacted to omit irrelevant output):
I’m running ansible from the root ( [defaults]
roles_path = setup/kubespray/roles As seen in the directory listing, I imported the ---
system_upgrade: true
system_upgrade_reboot: on-upgrade # never, always And I created an # This playbook borrows parts of the upgrade-cluster.yml play from Kubespray.
# It also makes use of the system-upgrade Kubespray role which has not yet been released.
# See https://github.com/kubernetes-sigs/kubespray/blob/36e5d742dc2b3f7984398c38009f236be7c3c065/playbooks/upgrade_cluster.yml
# See https://github.com/kubernetes-sigs/kubespray/blob/36e5d742dc2b3f7984398c38009f236be7c3c065/roles/upgrade/system-upgrade/tasks/main.yml
---
- name: Check ansible version
import_playbook: kubespray/playbooks/ansible_version.yml
- name: Ensure compatibility with old groups
import_playbook: kubespray/playbooks/legacy_groups.yml
- name: Gather facts
tags: always
import_playbook: kubespray/playbooks/facts.yml
- name: Handle upgrades to control plane hosts
gather_facts: False
hosts: kube_control_plane
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
serial: 1
roles:
- { role: kubespray-defaults }
- { role: upgrade/pre-upgrade, tags: pre-upgrade }
- { role: system-upgrade, tags: system-upgrade }
- { role: upgrade/post-upgrade, tags: post-upgrade }
- name: Handle upgrades to worker hosts
hosts: kube_node:calico_rr:!kube_control_plane
gather_facts: False
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
serial: "{{ serial | default('20%') }}"
roles:
- { role: kubespray-defaults }
- { role: upgrade/pre-upgrade, tags: pre-upgrade }
- { role: system-upgrade, tags: system-upgrade }
- { role: upgrade/post-upgrade, tags: post-upgrade } I’m taking advantage of the existing roles used by Here is how I run it:
But the play stays stuck on the “YUM upgrade all packages” task. I ssh'ed in the node after more than 45 minutes waiting for the task to end, only to see that the following command only reported itself:
Also, running:
Gave me the following result:
So, the After killing the play, I went ahead and tried to run the
Here is the output (redacted):
Before running ansible, I ssh'ed in the node and ran:
While ansible was performing the yum update, I saw the process
I was afraid that something related to the ssh server was updated and broke ansible, but my I think this is an ansible issue, but am not really sure. Would you have any insights on why this is happening and how it could be fixed? PS : I tried using the |
I need to do some further testing, but I think I found an acceptable workaround. See master...nicolas-goudry:kubespray:fix/yum-system-upgrade. @sathieu @MrFreezeex what do you think about that ? Edit: after some tests, this workaround needs further tweaking and testing. I’ll let you know how it goes. Edit 2: Something really weird is happening with this workaround. When I run the But, when I update the - hosts: all
gather_facts: no
tasks:
# Workaround to whole system update not working with yum: name=*
- name: YUM | Get available package updates
yum:
list: updates
register: yum_available_package_updates
- name: YUM | Debug packages to update
debug:
msg: "{{ yum_available_package_updates.results | map(attribute='name') | list }}"
- name: YUM | Update packages # noqa package-latest
yum:
name: "{{ yum_available_package_updates.results | map(attribute='name') | list }}"
state: 'latest'
register: yum_upgrade
#
# - name: YUM | Reboot after packages updates # noqa no-handler
# when:
# - yum_upgrade.changed or system_upgrade_reboot == 'always'
# - system_upgrade_reboot != 'never'
# reboot: And run it with the following command:
It works… |
@nicolas-goudry Sorry for this late response. I don't use Rocky Linux or any of the RPM-based distro. I took the code from the yum module doc. Could you propose a PR with your improvement? |
@sathieu I still have some weird issues with this workaround so I’m not sure it should land in Kubespray just yet. I think it’s something related to Python versions discrepancies between the control node and managed nodes. Not quite sure though. I’ll have to do more tests but I’m struggling to find the time to do so. I’ll keep you posted. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
We want to upgrade the system packages of the nodes (including the linux kernel).
Currently we do this with a specific playbook, but this leads to cordoning each node twice.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: