-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPI on bare metal fails for OKD SCOS 4.12 maybe because of broken DNS settings on masters #7
Comments
I see the same in regards to I have created a issue for this here: #9 |
I see the same behavior - failed systemd-sysusers and subsequent inexplicable DNS errors - on a bare metal (actual, not virtualized) cluster update from 4.12.0-0.okd-scos-2023-03-23-213604 to 4.12.0-0.okd-scos-2023-04-14-052931. Had to roll back the update as my MCPs wouldn't progress and I couldn't force them to the new image/rendered config. If there is any specific log that would be useful, let me know. |
FYI |
|
The sysusers issue is like not related here. Have you checked for selinux failures on the nodes? We seen issues inb OKD/FCOS where NetworkManager wasn't able to run its dispatcher scripts, resulting in similar symptoms. |
Went down the rabbit hole and tested some older releases. With
Unlike the [0]
[1]
[2]
[3]
|
With For me, it looks like |
OKD/SCOS had issues with IPI on older releases [0]. [0] okd-project/okd-scos#7
When doing IPI on bare metal servers with OKD SCOS 4.12, the deployment fails with:
All three control plane nodes have invalid DNS settings. Instead of using the DNS nameserver(s) provided by the local
DHCP server, all masters have random (?) non-local DNS nameservers listed in
/etc/resolv.conf
:The nameservers from network 192.168.158.0/24 listed above are the ip addresses of those nodes retrieved via DHCP. But
16.182.227.198, 128.154.207.62 and 192.23.149.53 are wrong(?), the local dns nameserver which is provided by the DHCP
server is 192.168.158.26.
Resolving registry.ci.openshift.org fails due to those broken nameservers:
Also systemd-sysusers.service fails on all master nodes (but that might be a red herring):
The same environment and
install-config.yaml
works fine with OCP RHCOS 4.12 though.Version
Platform
registry.ci.openshift.org/origin/release-scos:scos-4.12
How reproducible
With Docker Compose and enough RAM, you can reproduce this bug using Ansible hosts
lvrt-lcl-session-srv-4*
fromAnsible collection
jm1.cloudy
. This will deploy an installer-provisioned OKD cluster based on SCOS in a Dockercontainer and uses QEMU/KVM based virtual machines to simulate bare-metal servers.
The example install-config.yaml used for IPI works fine with OpenShift 4.12. This README.md has instructions on what
to change to deploy OpenShift instead of OKD.
This bug is 100% reproducible.
Log bundle
log-bundle-20230222210622.tar.gz
The text was updated successfully, but these errors were encountered: