Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Yum Updates in Cloud-Init #1074

Merged
merged 2 commits into from
Nov 18, 2022

Conversation

bwagner5
Copy link
Contributor

@bwagner5 bwagner5 commented Oct 31, 2022

Issue #, if available:
#1099

Description of changes:

Generally, the cloud-init AL2 config checks for and updates some security related packages via the 'package-update-upgrade-install' module in the cloud-init-config stage (before user-data). This usually results in a 5 second delay executing user-data, even though updates are not required a lot of the time, if AMIs are being updated frequently. The main latency when checking for updates is the yum cache hydration in /var/cache/yum which downloads ~700M of data when the first yum update is executed.

du -h /var/cache/yum/
473M	/var/cache/yum/x86_64/2/amzn2-core/gen
0	/var/cache/yum/x86_64/2/amzn2-core/packages
590M	/var/cache/yum/x86_64/2/amzn2-core
768K	/var/cache/yum/x86_64/2/amzn2extra-docker/gen
0	/var/cache/yum/x86_64/2/amzn2extra-docker/packages
936K	/var/cache/yum/x86_64/2/amzn2extra-docker
95M	/var/cache/yum/x86_64/2/amzn2extra-kernel-5.4/gen
0	/var/cache/yum/x86_64/2/amzn2extra-kernel-5.4/packages
121M	/var/cache/yum/x86_64/2/amzn2extra-kernel-5.4
711M	/var/cache/yum/x86_64/2
711M	/var/cache/yum/x86_64
711M	/var/cache/yum/

It would be possible to include the cache in the AMI, but we would need to create an image per region which we currently do not do.

Additionally, updating packages on startup can result in version skew across a cluster of instances using the same AMI version. This is undesirable from a configuration perspective and could cause stability problems in the case of an update breaking a node's bootstrap process.

In the following cloud-init analyze table, notice that config-package-update-upgrade-install takes 5 seconds to complete when no updates are needed. If updates are required, this module can take anywhere from 5-10 seconds.

No Updates are required:

> /usr/local/bin/cloud-init analyze show
-- Boot Record 01 --
The total time elapsed since completing an event is printed after the "@" character.
The time the event takes is printed after the "+" character.

Starting stage: init-local
|`->no cache found @00.00000s +00.00000s
Finished stage: (init-local) 00.00000 seconds

Starting stage: init-network
|`->no cache found @01.00000s +00.00000s
|`->found network data from DataSourceEc2 @01.00000s +01.00000s
|`->setting up datasource @02.00000s +00.00000s
|`->reading and applying user-data @02.00000s +00.00000s
|`->reading and applying vendor-data @02.00000s +00.00000s
|`->activating datasource @02.00000s +00.00000s
|`->config-migrator ran successfully @02.00000s +00.00000s
|`->config-bootcmd ran successfully @02.00000s +00.00000s
|`->config-write-files ran successfully @02.00000s +00.00000s
|`->config-write-metadata ran successfully @02.00000s +00.00000s
|`->config-amazonlinux_repo_https ran successfully @02.00000s +00.00000s
|`->config-growpart ran successfully @02.00000s +00.00000s
|`->config-resizefs ran successfully @02.00000s +00.00000s
|`->config-set-hostname ran successfully @02.00000s +00.00000s
|`->config-update-hostname ran successfully @02.00000s +00.00000s
|`->config-update-etc-hosts ran successfully @02.00000s +00.00000s
|`->config-rsyslog ran successfully @02.00000s +00.00000s
|`->config-users-groups ran successfully @02.00000s +00.00000s
|`->config-ssh ran successfully @02.00000s +00.00000s
|`->config-resolv-conf ran successfully @02.00000s +00.00000s
Finished stage: (init-network) 01.00000 seconds

Starting stage: modules-config
|`->config-disk_setup ran successfully @03.00000s +00.00000s
|`->config-mounts ran successfully @03.00000s +00.00000s
|`->config-locale ran successfully @03.00000s +00.00000s
|`->config-set-passwords ran successfully @03.00000s +00.00000s
|`->config-yum-configure ran successfully @03.00000s +00.00000s
|`->config-yum-add-repo ran successfully @03.00000s +00.00000s
|`->config-package-update-upgrade-install ran successfully @03.00000s +05.00000s
|`->config-timezone ran successfully @08.00000s +00.00000s
|`->config-disable-ec2-metadata ran successfully @08.00000s +00.00000s
|`->config-runcmd ran successfully @08.00000s +00.00000s
Finished stage: (modules-config) 05.00000 seconds

Starting stage: modules-final
|`->config-scripts-per-once ran successfully @08.00000s +00.00000s
|`->config-scripts-per-boot ran successfully @08.00000s +00.00000s
|`->config-scripts-per-instance ran successfully @08.00000s +00.00000s
|`->config-scripts-user ran successfully @08.00000s +04.00000s
|`->config-ssh-authkey-fingerprints ran successfully @12.00000s +00.00000s
|`->config-keys-to-console ran successfully @12.00000s +00.00000s
|`->config-phone-home ran successfully @12.00000s +00.00000s
|`->config-final-message ran successfully @12.00000s +00.00000s
|`->config-power-state-change ran successfully @12.00000s +00.00000s
Finished stage: (modules-final) 04.00000 seconds

Total Time: 10.00000 seconds

1 boot records analyzed

Updates are required:

> /usr/local/bin/cloud-init analyze show
-- Boot Record 01 --
The total time elapsed since completing an event is printed after the "@" character.
The time the event takes is printed after the "+" character.

Starting stage: init-local
|`->no cache found @00.00000s +00.00000s
Finished stage: (init-local) 00.00000 seconds

Starting stage: init-network
|`->no cache found @01.00000s +00.00000s
|`->found network data from DataSourceEc2 @01.00000s +00.00000s
|`->setting up datasource @01.00000s +00.00000s
|`->reading and applying user-data @01.00000s +00.00000s
|`->reading and applying vendor-data @01.00000s +00.00000s
|`->activating datasource @01.00000s +00.00000s
|`->config-migrator ran successfully @01.00000s +00.00000s
|`->config-bootcmd ran successfully @01.00000s +00.00000s
|`->config-write-files ran successfully @01.00000s +00.00000s
|`->config-write-metadata ran successfully @01.00000s +00.00000s
|`->config-amazonlinux_repo_https ran successfully @01.00000s +00.00000s
|`->config-growpart ran successfully @01.00000s +00.00000s
|`->config-resizefs ran successfully @01.00000s +00.00000s
|`->config-set-hostname ran successfully @01.00000s +00.00000s
|`->config-update-hostname ran successfully @01.00000s +00.00000s
|`->config-update-etc-hosts ran successfully @01.00000s +00.00000s
|`->config-rsyslog ran successfully @01.00000s +00.00000s
|`->config-users-groups ran successfully @01.00000s +01.00000s
|`->config-ssh ran successfully @02.00000s +00.00000s
|`->config-resolv-conf ran successfully @02.00000s +00.00000s
Finished stage: (init-network) 01.00000 seconds

Starting stage: modules-config
|`->config-disk_setup ran successfully @02.00000s +00.00000s
|`->config-mounts ran successfully @02.00000s +00.00000s
|`->config-locale ran successfully @02.00000s +00.00000s
|`->config-set-passwords ran successfully @02.00000s +00.00000s
|`->config-yum-configure ran successfully @02.00000s +00.00000s
|`->config-yum-add-repo ran successfully @02.00000s +00.00000s
|`->config-package-update-upgrade-install ran successfully @02.00000s +05.00000s
|`->config-timezone ran successfully @07.00000s +00.00000s
|`->config-disable-ec2-metadata ran successfully @07.00000s +00.00000s
|`->config-runcmd ran successfully @07.00000s +00.00000s
Finished stage: (modules-config) 05.00000 seconds

Starting stage: modules-final
|`->config-scripts-per-once ran successfully @07.00000s +00.00000s
|`->config-scripts-per-boot ran successfully @07.00000s +00.00000s
|`->config-scripts-per-instance ran successfully @07.00000s +00.00000s
|`->config-scripts-user ran successfully @07.00000s +05.00000s
|`->config-ssh-authkey-fingerprints ran successfully @12.00000s +00.00000s
|`->config-keys-to-console ran successfully @12.00000s +00.00000s
|`->config-phone-home ran successfully @12.00000s +00.00000s
|`->config-final-message ran successfully @12.00000s +00.00000s
|`->config-power-state-change ran successfully @12.00000s +00.00000s
Finished stage: (modules-final) 05.00000 seconds

Total Time: 11.00000 seconds

1 boot records analyzed

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done

Created AMI, launched, observed going to ready

See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.

@bwagner5 bwagner5 force-pushed the opp-yum-updates branch 4 times, most recently from 1be312e to dde694d Compare November 5, 2022 16:49
@bwagner5 bwagner5 changed the title Non-Blocking yum updates Remove Yum Updates in Cloud-Init Nov 8, 2022
@bwagner5 bwagner5 force-pushed the opp-yum-updates branch 2 times, most recently from 53cd26c to 122940c Compare November 8, 2022 18:04
@cartermckinnon
Copy link
Member

cartermckinnon commented Nov 8, 2022

I buy the version skew argument.

In your second test, when updates were installed; what packages were affected? I've never really understood what gets classified as "security-related" for this behavior. The kernel obviously couldn't be updated because that would require a reboot. What would we be giving up by removing this?

@bwagner5
Copy link
Contributor Author

bwagner5 commented Nov 8, 2022

I buy the version skew argument.

In your second test, when updates were installed; what packages were affected? I've never really understood what gets classified as "security-related" for this behavior. The kernel obviously couldn't be updated because that would require a reboot. What would we be giving up by removing this?

The second test was a systemd update that happened in the wild with the previous AMI release. The packages were updated but didn't take affect since it would have required a reboot anyways. So the 8 seconds was just checking and then downloading the packages that weren't even installed.

@bwagner5
Copy link
Contributor Author

To answer your question more directly on what we would lose out on:

Some of packages that can be updated would be a CVE patch for curl or sshd. In curl's case, the binary would be updated and the next invocation of it would be patched. In sshd's case, the binary would be updated but the service wouldn't be restarted unless the user restarted it manually within user-data.

files/cloud.cfg Outdated
@@ -0,0 +1,82 @@
# WARNING: Modifications to this file may be overridden by files in
# /etc/cloud/cloud.cfg.d
Copy link
Member

@cartermckinnon cartermckinnon Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an upstream we can add a reference to here, so we can rebase on the defaults every once in a while? did you just grab this off an AL2 instance?

seems like this might be the source of truth, but it's a jinja template 🤢 https://github.com/canonical/cloud-init/blob/main/config/cloud.cfg.tmpl

Copy link
Contributor Author

@bwagner5 bwagner5 Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I grabbed it off an instance. That template resolved isn't what is shipped in AL2. I asked and it looks like it is only available in the cloud-init RPM in the amzn2 yum repo. It's a little unfortunate, but I'm also told it's rarely updated. The version of cloud-init on AL2 is pretty old too:

> cloud-init --version
/bin/cloud-init 19.3-46.amzn2

The latest release upstream is 22.3.4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah bummer. I guess you already considered patching cloud.cfg instead of wholesale replacing it? Doesn't seem like there's a way we can disable this module with a /etc/cloud/cloud.cfg.d drop-in either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely went down that path first and it doesn't look like there's a way to do with with /etc/cloud/cloud.cfg.d.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to just sed the file to remove that one module.

@cartermckinnon cartermckinnon added the enhancement New feature or request label Nov 11, 2022
@cartermckinnon cartermckinnon merged commit 21870b9 into awslabs:master Nov 18, 2022
@amitay-elementor
Copy link

amitay-elementor commented Jan 1, 2023

Does this PR break usage of packages entirely?
Because after updating to the latest ami this doesn't seem to work anymore:

packages:
- amazon-efs-utils

meaning the package isn't installed... (reverting to the previous ami works)

Is that the expected outcome?
If it is, then it really should be communicated better because it is a breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants