Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomodeset required for booting certain hardware #498

Closed
agracey opened this issue Nov 4, 2022 · 20 comments
Closed

nomodeset required for booting certain hardware #498

agracey opened this issue Nov 4, 2022 · 20 comments
Labels
kind/bug Something isn't working kind/enhancement New feature or request

Comments

@agracey
Copy link

agracey commented Nov 4, 2022

What steps did you take and what happened:

I'm reporting on behalf of a user who is having issues booting Elemental Teal. They were able to work around the issue by pressing e and adding nomodeset to the kernel params.

They are using an AMD Ryzen-based SimplyNUC (https://simplynuc.com/product/llm2v8cy-full/)

What did you expect to happen:

The machine should boot normally without manual intervention.

Environment: (Asking for details and will fill in)

  • Elemental release version (use cat /etc/os-release):
  • Rancher version:
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
@kkaempf kkaempf added the kind/bug Something isn't working label Nov 7, 2022
@kkaempf
Copy link
Contributor

kkaempf commented Nov 7, 2022

See rancher/elemental-toolkit#1773

@kkaempf
Copy link
Contributor

kkaempf commented Nov 7, 2022

I'm reporting on behalf of a user who is having issues booting Elemental Teal.

What happened exactly ?

@agracey
Copy link
Author

agracey commented Nov 7, 2022

Forwarding the email to you

@kkaempf
Copy link
Contributor

kkaempf commented Nov 8, 2022

There's no essential information in the email except that the machine "locks up".

Thinking about it, I'd reject this request. Adding nomodeset would work around errors instead of fixing them.
I would document this problem (in FAQ section ?) and encourage people to work with us to get their system properly supported.

@ldevulder
Copy link
Contributor

Just a small comment: nomodset is only useful if KMS (Kernel Mode Setting) is used, but in the scope of Elemental there is no need for GUI in the nodes. So maybe better to "simply" remove it (if it's not already the case, didn't checked).

@Itxaka
Copy link
Contributor

Itxaka commented Nov 24, 2022

KMS is not only for gui, its also for tty resolution and fast switching. Mainly the kernel can change the graphics itself instead of waiting for the X server to do so.

And if I remember correctly, this is done by the graphic card driver automatically on load, so its part of the boot process.

Im wondering if it may be broke due to how we set our grub config:

set gfxmode=auto
set gfxpayload=keep
insmod all_video
insmod gfxterm
insmod loopback

Maybe we should set the gfxmode to text? No idea to be honest, but as @ldevulder mentions, it makes no sense in the context of the toolkit. We build server distros with it, not GUIs so....

@agracey
Copy link
Author

agracey commented Nov 25, 2022

We build server distros with it, not GUIs so....

I need to show you some of the crazier demos that I'm working on -- I'd like to be building wallboards and point of service systems as well 😃

If we removed nomodeset/KMS, would it drop the drm devices from udev? Would startx break?

@Itxaka
Copy link
Contributor

Itxaka commented Nov 25, 2022

If we removed nomodeset/KMS, would it drop the drm devices from udev? Would startx break?

If Im not mistaken, with nomodeset always enabled it wont load the video drivers and use the efi resolution during boot. And when X loads it should load the drivers at that point...theoretically

Still, as its a radeon driver (vega 7 I think) the kms support should be already in the kernel...there should be no issues with it, althougth amd drivers have always been a cluster****, so no idea if it uses the kernel driver, if it supports kms only, if maybe we are not bundling the drivers in the kernel....needs more investigation. Send me a NUC for testing? :P

@Itxaka
Copy link
Contributor

Itxaka commented Nov 25, 2022

I need to show you some of the crazier demos that I'm working on -- I'd like to be building wallboards and point of service systems as well smiley

But the services would be running on elemental nodes only right? Then you connect from an external device for those wallboards (like with an RPI)? Or its the idea to have a webview in the same elemental node?

@Itxaka
Copy link
Contributor

Itxaka commented Nov 25, 2022

I had a look and vega 7 seems to use the amdgpu driver and its available on elemental:

[   55.114296] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.
[   55.524613] [drm] amdgpu kernel modesetting enabled.
[   55.525036] amdgpu: CRAT table not found
[   55.525709] amdgpu: Virtual CRAT table created for CPU
[   55.526365] amdgpu: Topology: Add CPU node

I dont see the firmware files for it though, not sure if we need them or not... kernel-firmware-amdgpu maybe we are missing that package, althougth we install the kernel-firmware-all which should bring all fw packages.

Maybe its just that the package its not available for SLE-Micro-Rancher and we need to ask for it to be included.

@agracey would be awesome to get the dmesg of both with nomodeset and without if its possible via serial or something, because pretty sure that the dmesg of the blank image provides info on what we are missing.

@Itxaka
Copy link
Contributor

Itxaka commented Nov 25, 2022

Indeed the firmware-amdgpu package does not appear on the installed packages for SLE-micro-rancher...

@Itxaka
Copy link
Contributor

Itxaka commented Nov 25, 2022

but its on the original SLE15-sp4, so for sure that package is missing

@agracey
Copy link
Author

agracey commented Nov 25, 2022

Send me a NUC for testing?

I'm working on that...

Or its the idea to have a webview in the same elemental node?

I'm running X11 in a container as a sidecar 😇

but its on the original SLE15-sp4, so for sure that package is missing

I'll add it to my list for 5.4 unless we need it earlier

More generally, we should document how to change these flags sooner rather than later. That way we aren't locked in to a choice.

@Itxaka
Copy link
Contributor

Itxaka commented Nov 25, 2022

I'll add it to my list for 5.4 unless we need it earlier

Nice, maybe we should ask why we cherry pick the firmware packages instead of just installing the kernel-firmware-all, size?

I'm running X11 in a container as a sidecar innocent

damn...

More generally, we should document how to change these flags sooner rather than later. That way we aren't locked in to a choice.

I think this is already documented on the toolkit

They should be able to set this by running grub2-editenv /oem/grubenv set extra_cmdline=nomodeset and that should apply that cmdline to the kernel on all entries (active,passive,recovery)

@agracey
Copy link
Author

agracey commented Nov 25, 2022

And that command would be run as part of the cloud-config? (Sorry, I feel like the answer is obvious but I want to make sure I'm not missing a part of the puzzle here)

@Itxaka
Copy link
Contributor

Itxaka commented Nov 25, 2022

They may need to use a DEV image to build their isos with the injected registration, i.e. when calling the elemental-iso-add-registration they migth need to add REPO=Dev in front so it downloads the dev image with the missing firmware, once that is accepted and the iso rebuilt.

That would be simpler than adding the nomodeset

@kkaempf kkaempf added the kind/enhancement New feature or request label Nov 29, 2022
@Itxaka
Copy link
Contributor

Itxaka commented Dec 9, 2022

@agracey I think the firmware package was added to elemental before cutting the latest release, could you have them try again with the latest stable iso?

If the stable doesnt work, could you indicate them to use the REPO=Dev while injecting the iso so they get the latest dev build and test that?

Also, would be nice if the original reporter could comment on the issue directly so there is a direct communication regarding this, as its probably solved already :)

@kkaempf
Copy link
Contributor

kkaempf commented Dec 9, 2022

Careful, Dev has the kubebuilder operator. Staging is probably a better choice. 😉

Which reminds me that we should add kernel-firmware-all instead of kernel-firmware-amdgpu.

@Itxaka
Copy link
Contributor

Itxaka commented Dec 9, 2022

Good point, I forgot that we merged that! Indeed staging should contain the old operator AND the firmware fix. Stable probably too but I dont have the dates too clear.

@kkaempf
Copy link
Contributor

kkaempf commented Dec 13, 2022

image has kernel-firmware-all meanwhile. Closing.

@kkaempf kkaempf closed this as completed Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working kind/enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

4 participants