-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpm-ostree startup delays because of GPG key loading #761
Comments
Supporting Maybe best is to have all commits signed with the N and the N-1 keys. |
Makes sense, though an entirely different fix is to cache the deployment verification status (debate whether to do it in |
In Fedora today, we ship 51 GPG pubkeys in `/etc/pki/rpm-gpg`. These keys are used to verify RPM packages, but also OSTree commits. But the sheer number of keys makes actually loading them and verifying signatures costly. rpm-ostree pays this price at startup when creating variants for its D-Bus properties describing the deployments. Multiple things make this even costlier in rpm-ostree: 1. by default we auto-exit after a certain period of time, which means that on the next startup we have to pay the verification price again 2. the same deployed commit may be re-verified up to 3 times as the different D-Bus properties may refer to the same deployment, and we dumbly regenerate its `GVariant` each time This results in a noticeable delay in rpm-ostree startup: coreos/fedora-coreos-tracker#761 I believe also this is the root cause for the `ostree.hotfix` FCOS test flaking: coreos/fedora-coreos-tracker#942. My theory is that when this test runs on nodes with contended I/O (e.g. with many other tests running in parallel), GPG verification can get slow enough that the daemon doesn't finish in time to answer back the the D-Bus call from the client, which then times out. That test creates a new deployment using `ostree admin unlock --hotfix` which multiples the cost. This patch adds caching of verification results as suggested in the tracker issue. This makes rpm-ostree startup *noticeably* faster and should also fix the `ostree.hotfix` flake. I think though we should still do $something about those keys, ideally at the Fedora level if not in FCOS/FSB/FIoT. Closes: coreos/fedora-coreos-tracker#761
The original problem which motivated me filing this is fixed by coreos/rpm-ostree#3406, but as mentioned there, we should probably still discuss if there's something we should do about all those keys. |
In Fedora today, we ship 51 GPG pubkeys in `/etc/pki/rpm-gpg`. These keys are used to verify RPM packages, but also OSTree commits. But the sheer number of keys makes actually loading them and verifying signatures costly. rpm-ostree pays this price at startup when creating variants for its D-Bus properties describing the deployments. Multiple things make this even costlier in rpm-ostree: 1. by default we auto-exit after a certain period of time, which means that on the next startup we have to pay the verification price again 2. the same deployed commit may be re-verified up to 3 times as the different D-Bus properties may refer to the same deployment, and we dumbly regenerate its `GVariant` each time This results in a noticeable delay in rpm-ostree startup: coreos/fedora-coreos-tracker#761 I believe also this is the root cause for the `ostree.hotfix` FCOS test flaking: coreos/fedora-coreos-tracker#942. My theory is that when this test runs on nodes with contended I/O (e.g. with many other tests running in parallel), GPG verification can get slow enough that the daemon doesn't finish in time to answer back the the D-Bus call from the client, which then times out. That test creates a new deployment using `ostree admin unlock --hotfix` which multiples the cost. This patch adds caching of verification results as suggested in the tracker issue. This makes rpm-ostree startup *noticeably* faster and should also fix the `ostree.hotfix` flake. I think though we should still do $something about those keys, ideally at the Fedora level if not in FCOS/FSB/FIoT. Closes: coreos/fedora-coreos-tracker#761
In Fedora today, we ship 51 GPG pubkeys in `/etc/pki/rpm-gpg`. These keys are used to verify RPM packages, but also OSTree commits. But the sheer number of keys makes actually loading them and verifying signatures costly. rpm-ostree pays this price at startup when creating variants for its D-Bus properties describing the deployments. Multiple things make this even costlier in rpm-ostree: 1. by default we auto-exit after a certain period of time, which means that on the next startup we have to pay the verification price again 2. the same deployed commit may be re-verified up to 3 times as the different D-Bus properties may refer to the same deployment, and we dumbly regenerate its `GVariant` each time This results in a noticeable delay in rpm-ostree startup: coreos/fedora-coreos-tracker#761 I believe also this is the root cause for the `ostree.hotfix` FCOS test flaking: coreos/fedora-coreos-tracker#942. My theory is that when this test runs on nodes with contended I/O (e.g. with many other tests running in parallel), GPG verification can get slow enough that the daemon doesn't finish in time to answer back the the D-Bus call from the client, which then times out. That test creates a new deployment using `ostree admin unlock --hotfix` which multiples the cost. This patch adds caching of verification results as suggested in the tracker issue. This makes rpm-ostree startup *noticeably* faster and should also fix the `ostree.hotfix` flake. I think though we should still do $something about those keys, ideally at the Fedora level if not in FCOS/FSB/FIoT. Closes: coreos/fedora-coreos-tracker#761
In Fedora today, we ship 51 GPG pubkeys in `/etc/pki/rpm-gpg`. These keys are used to verify RPM packages, but also OSTree commits. But the sheer number of keys makes actually loading them and verifying signatures costly. rpm-ostree pays this price at startup when creating variants for its D-Bus properties describing the deployments. Multiple things make this even costlier in rpm-ostree: 1. by default we auto-exit after a certain period of time, which means that on the next startup we have to pay the verification price again 2. the same deployed commit may be re-verified up to 3 times as the different D-Bus properties may refer to the same deployment, and we dumbly regenerate its `GVariant` each time This results in a noticeable delay in rpm-ostree startup: coreos/fedora-coreos-tracker#761 I believe also this is the root cause for the `ostree.hotfix` FCOS test flaking: coreos/fedora-coreos-tracker#942. My theory is that when this test runs on nodes with contended I/O (e.g. with many other tests running in parallel), GPG verification can get slow enough that the daemon doesn't finish in time to answer back the the D-Bus call from the client, which then times out. That test creates a new deployment using `ostree admin unlock --hotfix` which multiples the cost. This patch adds caching of verification results as suggested in the tracker issue. This makes rpm-ostree startup *noticeably* faster and should also fix the `ostree.hotfix` flake. I think though we should still do $something about those keys, ideally at the Fedora level if not in FCOS/FSB/FIoT. Closes: coreos/fedora-coreos-tracker#761
The fix for this went into |
The fix for this went into |
When
rpm-ostreed.service
starts, there's a noticeable delay and a slight CPU spike caused by rpm-ostree having to load all the GPG keys from/etc/pki/rpm-gpg
to verify deployment commits (because we usegpgkeypath=/etc/pki/rpm-gpg/
):Ideally it'd only import the one key it needs corresponding to the right release, but it's more complicated than that because of major version rebases.
We could probably at least nuke all the super ancient keys in there to start (could do that in a post-processing script though... maybe all of Fedora should do that; yum-based systems don't really suffer from the status quo because yumrepo files always point to a specific key).
This also applies to other rpm-ostree-based Fedora variants with the same remote configuration
The text was updated successfully, but these errors were encountered: