Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpm-ostree startup delays because of GPG key loading #761

Closed
jlebon opened this issue Mar 8, 2021 · 5 comments · Fixed by coreos/rpm-ostree#3406
Closed

rpm-ostree startup delays because of GPG key loading #761

jlebon opened this issue Mar 8, 2021 · 5 comments · Fixed by coreos/rpm-ostree#3406
Assignees
Labels
jira for syncing to jira kind/bug

Comments

@jlebon
Copy link
Member

jlebon commented Mar 8, 2021

When rpm-ostreed.service starts, there's a noticeable delay and a slight CPU spike caused by rpm-ostree having to load all the GPG keys from /etc/pki/rpm-gpg to verify deployment commits (because we use gpgkeypath=/etc/pki/rpm-gpg/):

$ grep gpgkeypath /etc/ostree/remotes.d/fedora.conf
gpgkeypath=/etc/pki/rpm-gpg/
$ time rpm-ostree status
State: idle
...

real	0m2.848s
user	0m0.040s
sys	0m0.074s
$ systemctl stop rpm-ostreed
$ grep gpgkeypath /etc/ostree/remotes.d/fedora.conf
gpgkeypath=/etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-33-primary
$ time rpm-ostree status
State: idle
...

real	0m0.384s
user	0m0.041s
sys	0m0.073s

Ideally it'd only import the one key it needs corresponding to the right release, but it's more complicated than that because of major version rebases.

We could probably at least nuke all the super ancient keys in there to start (could do that in a post-processing script though... maybe all of Fedora should do that; yum-based systems don't really suffer from the status quo because yumrepo files always point to a specific key).

This also applies to other rpm-ostree-based Fedora variants with the same remote configuration

@jlebon jlebon added the kind/bug label Mar 8, 2021
@jlebon
Copy link
Member Author

jlebon commented Mar 8, 2021

Supporting $releasever in gpgkeypath like yumrepos is possible I guess. The tricky bit is that the releasever needs to come from the target commit, which implies that we've downloaded e.g. /etc/os-release at least without verifying anything yet. We'd probably have to fold it into the commit metadata, which is still iffy because the remote commit is dictating which key to use. (To be clear, I don't think either of these options are acceptable.)

Maybe best is to have all commits signed with the N and the N-1 keys.

@cgwalters
Copy link
Member

Makes sense, though an entirely different fix is to cache the deployment verification status (debate whether to do it in /run or persistently).

jlebon added a commit to jlebon/rpm-ostree that referenced this issue Feb 4, 2022
In Fedora today, we ship 51 GPG pubkeys in `/etc/pki/rpm-gpg`. These
keys are used to verify RPM packages, but also OSTree commits. But the
sheer number of keys makes actually loading them and verifying
signatures costly. rpm-ostree pays this price at startup when creating
variants for its D-Bus properties describing the deployments.

Multiple things make this even costlier in rpm-ostree:
1. by default we auto-exit after a certain period of time, which means
   that on the next startup we have to pay the verification price again
2. the same deployed commit may be re-verified up to 3 times as the
   different D-Bus properties may refer to the same deployment, and we
   dumbly regenerate its `GVariant` each time

This results in a noticeable delay in rpm-ostree startup:
coreos/fedora-coreos-tracker#761

I believe also this is the root cause for the `ostree.hotfix` FCOS test
flaking: coreos/fedora-coreos-tracker#942. My
theory is that when this test runs on nodes with contended I/O (e.g.
with many other tests running in parallel), GPG verification can get
slow enough that the daemon doesn't finish in time to answer back the
the D-Bus call from the client, which then times out. That test creates
a new deployment using `ostree admin unlock --hotfix` which multiples
the cost.

This patch adds caching of verification results as suggested in the
tracker issue. This makes rpm-ostree startup *noticeably* faster and
should also fix the `ostree.hotfix` flake.

I think though we should still do $something about those keys, ideally
at the Fedora level if not in FCOS/FSB/FIoT.

Closes: coreos/fedora-coreos-tracker#761
@jlebon
Copy link
Member Author

jlebon commented Feb 4, 2022

The original problem which motivated me filing this is fixed by coreos/rpm-ostree#3406, but as mentioned there, we should probably still discuss if there's something we should do about all those keys.

jlebon added a commit to jlebon/rpm-ostree that referenced this issue Feb 4, 2022
In Fedora today, we ship 51 GPG pubkeys in `/etc/pki/rpm-gpg`. These
keys are used to verify RPM packages, but also OSTree commits. But the
sheer number of keys makes actually loading them and verifying
signatures costly. rpm-ostree pays this price at startup when creating
variants for its D-Bus properties describing the deployments.

Multiple things make this even costlier in rpm-ostree:
1. by default we auto-exit after a certain period of time, which means
   that on the next startup we have to pay the verification price again
2. the same deployed commit may be re-verified up to 3 times as the
   different D-Bus properties may refer to the same deployment, and we
   dumbly regenerate its `GVariant` each time

This results in a noticeable delay in rpm-ostree startup:
coreos/fedora-coreos-tracker#761

I believe also this is the root cause for the `ostree.hotfix` FCOS test
flaking: coreos/fedora-coreos-tracker#942. My
theory is that when this test runs on nodes with contended I/O (e.g.
with many other tests running in parallel), GPG verification can get
slow enough that the daemon doesn't finish in time to answer back the
the D-Bus call from the client, which then times out. That test creates
a new deployment using `ostree admin unlock --hotfix` which multiples
the cost.

This patch adds caching of verification results as suggested in the
tracker issue. This makes rpm-ostree startup *noticeably* faster and
should also fix the `ostree.hotfix` flake.

I think though we should still do $something about those keys, ideally
at the Fedora level if not in FCOS/FSB/FIoT.

Closes: coreos/fedora-coreos-tracker#761
@jlebon jlebon added the jira for syncing to jira label Feb 7, 2022
@jlebon jlebon self-assigned this Feb 7, 2022
jlebon added a commit to jlebon/rpm-ostree that referenced this issue Feb 8, 2022
In Fedora today, we ship 51 GPG pubkeys in `/etc/pki/rpm-gpg`. These
keys are used to verify RPM packages, but also OSTree commits. But the
sheer number of keys makes actually loading them and verifying
signatures costly. rpm-ostree pays this price at startup when creating
variants for its D-Bus properties describing the deployments.

Multiple things make this even costlier in rpm-ostree:
1. by default we auto-exit after a certain period of time, which means
   that on the next startup we have to pay the verification price again
2. the same deployed commit may be re-verified up to 3 times as the
   different D-Bus properties may refer to the same deployment, and we
   dumbly regenerate its `GVariant` each time

This results in a noticeable delay in rpm-ostree startup:
coreos/fedora-coreos-tracker#761

I believe also this is the root cause for the `ostree.hotfix` FCOS test
flaking: coreos/fedora-coreos-tracker#942. My
theory is that when this test runs on nodes with contended I/O (e.g.
with many other tests running in parallel), GPG verification can get
slow enough that the daemon doesn't finish in time to answer back the
the D-Bus call from the client, which then times out. That test creates
a new deployment using `ostree admin unlock --hotfix` which multiples
the cost.

This patch adds caching of verification results as suggested in the
tracker issue. This makes rpm-ostree startup *noticeably* faster and
should also fix the `ostree.hotfix` flake.

I think though we should still do $something about those keys, ideally
at the Fedora level if not in FCOS/FSB/FIoT.

Closes: coreos/fedora-coreos-tracker#761
jlebon added a commit to jlebon/rpm-ostree that referenced this issue Feb 10, 2022
In Fedora today, we ship 51 GPG pubkeys in `/etc/pki/rpm-gpg`. These
keys are used to verify RPM packages, but also OSTree commits. But the
sheer number of keys makes actually loading them and verifying
signatures costly. rpm-ostree pays this price at startup when creating
variants for its D-Bus properties describing the deployments.

Multiple things make this even costlier in rpm-ostree:
1. by default we auto-exit after a certain period of time, which means
   that on the next startup we have to pay the verification price again
2. the same deployed commit may be re-verified up to 3 times as the
   different D-Bus properties may refer to the same deployment, and we
   dumbly regenerate its `GVariant` each time

This results in a noticeable delay in rpm-ostree startup:
coreos/fedora-coreos-tracker#761

I believe also this is the root cause for the `ostree.hotfix` FCOS test
flaking: coreos/fedora-coreos-tracker#942. My
theory is that when this test runs on nodes with contended I/O (e.g.
with many other tests running in parallel), GPG verification can get
slow enough that the daemon doesn't finish in time to answer back the
the D-Bus call from the client, which then times out. That test creates
a new deployment using `ostree admin unlock --hotfix` which multiples
the cost.

This patch adds caching of verification results as suggested in the
tracker issue. This makes rpm-ostree startup *noticeably* faster and
should also fix the `ostree.hotfix` flake.

I think though we should still do $something about those keys, ideally
at the Fedora level if not in FCOS/FSB/FIoT.

Closes: coreos/fedora-coreos-tracker#761
@dustymabe dustymabe added the status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. label Feb 14, 2022
@dustymabe
Copy link
Member

The fix for this went into testing stream release 35.20220313.2.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. labels Mar 16, 2022
@dustymabe
Copy link
Member

The fix for this went into stable stream release 35.20220313.3.1.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira kind/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants