Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manifest: add kexec-tools for kdump #418

Merged
merged 1 commit into from
Oct 27, 2020

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Oct 27, 2020

This was discussed in:
coreos/fedora-coreos-tracker#622

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 27, 2020
@jlebon
Copy link
Member Author

jlebon commented Oct 27, 2020

Similar deps to those described in coreos/fedora-coreos-config#708:

Added:
  dracut-squash-049-75.git20200422.el8.x86_64
  ethtool-2:5.0-2.el8.x86_64
  kexec-tools-2.0.20-14.el8.x86_64
  lzo-2.08-14.el8.x86_64
  snappy-1.1.7-5.el8.x86_64
  squashfs-tools-4.3-19.el8.x86_64

This brings in lzo, which was already in FCOS but not yet in RHCOS.

/hold
while I sanity-check this locally.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 27, 2020
@cgwalters
Copy link
Member

Not opposed to this but I'd like to see at least some minimal investigation of why the heck it depends on dracut-squash for example.

@jlebon
Copy link
Member Author

jlebon commented Oct 27, 2020

Not opposed to this but I'd like to see at least some minimal investigation of why the heck it depends on dracut-squash for example.

Did you see coreos/fedora-coreos-config#708?

@jlebon
Copy link
Member Author

jlebon commented Oct 27, 2020

OK, dropping hold on this!
/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 27, 2020
@ashcrow
Copy link
Member

ashcrow commented Oct 27, 2020

This change looks good to my eyes BUT it may be something that makes more sense in extensions for OCP/RHCOS. If I remember correctly having kdump on the host but not in use can be a problem for some audits. cc'ing @JAORMX to jog my memory 😄

@JAORMX
Copy link

JAORMX commented Oct 27, 2020

I agree with @ashcrow , seems to me like this would be more appropraite in extensions. For compliance, folks are required to disable kdump in their deployments with the following rationale:

Kernel core dumps may contain the full contents of system memory at the time of the crash. Kernel core dumps consume a considerable amount of disk space and may result in denial of service by exhausting the available space on the target file system partition. Unless the system is used for kernel development or testing, there is little need to run the kdump service.

@cgwalters
Copy link
Member

cgwalters commented Oct 27, 2020

Ahh I hadn't realized there was a compliance angle to this.

Well, currently today RHCOS just inherits the "bootable-rpm-ostree.yaml" bits so adding it to FCOS doesn't preclude doing it via an extension for RHCOS.

That said:

Unless the system is used for kernel development or testing, there is little need to run the kdump service.

Note that kdump.service won't (shouldn't) run by default unless the crashkernel= argument is provided...hm actually I think it really should have ConditionKernelCommandLine=crashkernel.

IOW while it's installed by default it won't be enabled by default on FCOS, and a compliance system can easily check this.

@jlebon
Copy link
Member Author

jlebon commented Oct 27, 2020

Not against having it as an extension in RHCOS. This PR comes out of rough agreement from the team to have it baked in to reduce friction in enabling what we want to consider an important host feature. In pure FCOS, this means one less reboot. In OCP/OKD however, because the host always incurs a reboot, having it as an extension is actually not much more expensive. But diverging from FCOS has downsides as well, esp. around test sharing.

FWIW, I can imagine FCOS eventually removing it from the host and making it an extension in the future once we have "live extensions" (which don't require reboots).

But anyway, as Colin said, this wouldn't be enabled by default (and in fact, this PR adds a preset exactly to revert that).

Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ensure this continues to be disabled by default and we can look at extensions later.

@cgwalters
Copy link
Member

cgwalters commented Oct 27, 2020

(and in fact, this PR adds a preset exactly to revert that).

Ahh I had missed that. But then, why does the preset enable it by default? Hmm well I guess that'd break existing users.

Does my suggestion

hm actually I think it really should have ConditionKernelCommandLine=crashkernel

seem right? Then we wouldn't need a preset override, it'd be "enabled by conditionally off".

@jlebon
Copy link
Member Author

jlebon commented Oct 27, 2020

Does my suggestion

hm actually I think it really should have ConditionKernelCommandLine=crashkernel

seem right? Then we wouldn't need a preset override, it'd be "enabled by conditionally off".

I think so. Let's see what the maintainers say: https://src.fedoraproject.org/rpms/kexec-tools/pull-request/5

@miabbott
Copy link
Member

/approve

@cgwalters
Copy link
Member

/approve
Do we feel the compliance angle is resolved?
/assign ashcrow
for the lgtm based on that

@JAORMX
Copy link

JAORMX commented Oct 27, 2020

/approve
Do we feel the compliance angle is resolved?
/assign ashcrow
for the lgtm based on that

Yep. As long as its not enabled by default it should be fine.

@cgwalters
Copy link
Member

Yep. As long as its not enabled by default it should be fine.

👍

So on that note...we could add something to tests/kola/misc-ro/misc-ro.sh

@ashcrow
Copy link
Member

ashcrow commented Oct 27, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 27, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, cgwalters, jlebon, miabbott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ashcrow,cgwalters,jlebon,miabbott]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 805607f into openshift:master Oct 27, 2020
@jlebon
Copy link
Member Author

jlebon commented Oct 28, 2020

So on that note...we could add something to tests/kola/misc-ro/misc-ro.sh

Let's do FCOS first, then we can copy/paste into RHCOS: coreos/fedora-coreos-config#716

@jlebon jlebon deleted the pr/add-kexec-tools branch April 23, 2023 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants