Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASMTRIAGE-7420 Fix kdump hook boot failure #67

Merged
merged 7 commits into from
Oct 22, 2024
Merged

Conversation

rustydb
Copy link
Contributor

@rustydb rustydb commented Oct 21, 2024

Summary and Scope

Issue Type

  • Bugfix Pull Request

Fixes issues when booting without an overlayFS.

Removes fallback assumptions around FSLabels that were causing code to run in undesirable cases.

Prerequisites

  • I have included documentation in my PR (or it is not required)
  • I tested this on internal system (if yes, please include results or a description of the test)
  • I tested this on a vshasta system (if yes, please include results or a description of the test)

Idempotency

Risks and Mitigations

Successfully tested deploying and rebooting:

  • NCN-Master
  • NCN-Worker
  • NCN-Storage

Successfully triggered a crash on:

  • NCN-Kubernetes
  • NCN-Storage

NOTE: RemoteISO panics are not supported
NOTE2: USB ISO panics are broken, USB devices are not detected in the kdump initrd.

This change removes the default/assumed overlayFS FSLabel. The assumed label was triggering code, like `metal-kdump.sh`, to run when it shouldn't (as the case with CASMTRIAGE-7420, when no overlayFS was given).

This change adds an early exit to `metal-kdump.sh` for boots without overlays.
@rustydb rustydb requested a review from a team as a code owner October 21, 2024 14:20
@rustydb rustydb changed the title CASMTRIAGE-7420 CASMTRIAGE-7420 Fix kdump hook boot failure Oct 21, 2024
Along the same theme as the previous commit, remove the chance for code to run against non-existent drive labels by removing the default/assumed labels.

Tidies up the cases for SQFSRAID; adds a blank case for safely loading `metal-md-lib.sh` where `root` is something else, and prints messages similar to the `oval_drive_scheme` case.

This also fixes a hidden bug on line 87 of `metal-md-lib.sh`, this line should be only overriding `SQFS_DRIVE_SCHEME` if it is equal to `CDLABEL` but the `||` was doing the opposite of that.
My IDE is finding "todo" as a keyword and highlighting it yellow. Spelling it correctly (with no space) prevents it from being recognized as a kw.
@rustydb
Copy link
Contributor Author

rustydb commented Oct 21, 2024

Using surtur for a remoteISO test.
Will use redbull for testing k8s and a usb-pit.

`metal-kdump.sh` was recreating some of the logic in `metal-md-lib.sh` for resolving the overlayFS scheme and authority. Short of sourcing `metal-md-lib.sh`, it would bode well if we parsed the scheme and authority every boot and only once.

By moving the scheme and authority parsing out of `metal-md-lib.sh`, the parsing will occur once and at the beginning of the boot instead of every time `metal-md-lib.sh` is sourced.

Additionally, this removes the `exit` calls from `metal-md-lib.sh`. Libraries shouldn't exit.
`MOUNT_OVERLAYfs_id` should be all lowercase, it was recently awkward cased in a previous, greedy string replace operation.

There is also no point in checking if `metal_overlayfs_id` is empty, it will always be empty until we set it. We want this variable to always be fresh, it should always recalculate when the function is ran later.
@rustydb rustydb merged commit 68361ab into main Oct 22, 2024
5 checks passed
@rustydb rustydb deleted the CASMTRIAGE-7420 branch October 22, 2024 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants