Skip to content
This repository has been archived by the owner on Jul 30, 2024. It is now read-only.

NO local DNS server - Install fails #161

Closed
miktaylor3 opened this issue Jul 23, 2021 · 3 comments
Closed

NO local DNS server - Install fails #161

miktaylor3 opened this issue Jul 23, 2021 · 3 comments
Labels
bug Something isn't working
Milestone

Comments

@miktaylor3
Copy link

Summary of Problem: Installation fails without a local DNS server

This is related to the factory install failing with can't open "API at https://api.edge.rdc100.lan:6443"

I was able to reproduce this is the lab in Reston by turning off the local DNS server and doing an install with just external DNS servers configured (DNS servers 208.67.222.222 and 208.67.220.220) . The installation failed at the same point as we saw in the factory.

Also note the IP address "10.0.2.3:53" in the error below is not something I have in my lab or anything I configured, not sure where that is coming from

STEP 8: WAIT FOR THE OPENSHIFT INSTALLATION TO COMPLETE
write upgrade status to tmp file localhost>localhost
wait for bootstrap to complete
failed: localhost: non-zero return code
there was an error during install
failed: localhost: level=info msg=Waiting up to 20m0s for the Kubernetes API at https://api.edge.rdc100.lan:6443...
level=error msg=Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get "https://api.edge.rdc100.lan:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp: lookup api.edge.rdc100.lan on 10.0.2.3:53: no such host
level=info msg=Use the following commands to gather logs from the cluster
level=info msg=openshift-install gather bootstrap --help
level=fatal msg=failed waiting for Kubernetes API: Get "https://api.edge.rdc100.lan:6443/version?timeout=32s": dial tcp: lookup api.edge.rdc100.lan on 10.0.2.3:53: no such host

@miktaylor3
Copy link
Author

Please hold - I'm doing some more testing, it seems like it may be related to the external DNS we were using. Install worked when using 1.1.1.1 as the external DNS

@miktaylor3
Copy link
Author

I've done several installs the last couple day and reproduced this in the Reston lab. Whenever the system points to and external DNS and is forced to resolve addresses on the private network (192.168.8.xx), it seems to have problems connecting to the cluster address and the install fails.

Fresh install on the same system, same config.sh and only changing the DNS to a local on the external network with the suggested DNS entries and the system works fine.

I've even tried just changing the DNS after and install and not re-installing and in general, once it fails, it tends to stay on the private network and does not work, even if I modify after modifying the DNS field to point to something external, I generally need to re-install the whole things (RHEL plus OCS) to reset it.

@rmkraus rmkraus added the bug Something isn't working label Jul 30, 2021
rmkraus added a commit that referenced this issue Jul 30, 2021
The router role was not correctly restarting the faroswan interface when
required.

Fix for #161
@rmkraus
Copy link
Member

rmkraus commented Jul 30, 2021

The fix for this issue has been pushed and will be published in the next version (soon).

In the meantime, just run a sudo nmcli con up faroswan which will load the correct DNS settings.

@rmkraus rmkraus added this to the 4.7.4 milestone Aug 9, 2021
@rmkraus rmkraus closed this as completed Aug 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants