DNS resolution fails if default search domain has a wildcard match #17316

ikus060 · 2017-11-14T23:38:20Z

Name resolution from inside the pod seams to be broken because of multiple factor.

Version

# oc version
oc v3.7.0-rc.0+e92d5c5
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.7.0-rc.0+e92d5c5
kubernetes v1.7.6+a08f5eeb62

Steps To Reproduce

Look like the /etc/resolv.conf file generated by openshift is not working in every scenario.

Just to show it's working with something...

# cat /etc/resolv.conf
nameserver 8.8.8.8
search patrikdufresne.com

# nslookup -debug dl-cdn.alpinelinux.org
Server:		8.8.8.8
Address:	8.8.8.8#53

------------
    QUESTIONS:
	dl-cdn.alpinelinux.org, type = A, class = IN
    ANSWERS:
    ->  dl-cdn.alpinelinux.org
	canonical name = global.prod.fastly.net.
	ttl = 59
    ->  global.prod.fastly.net
	internet address = 151.101.0.249
	ttl = 19
    ->  global.prod.fastly.net
	internet address = 151.101.64.249
	ttl = 19
    ->  global.prod.fastly.net
	internet address = 151.101.128.249
	ttl = 19
    ->  global.prod.fastly.net
	internet address = 151.101.192.249
	ttl = 19
    AUTHORITY RECORDS:
    ADDITIONAL RECORDS:
------------
Non-authoritative answer:
dl-cdn.alpinelinux.org	canonical name = global.prod.fastly.net.
Name:	global.prod.fastly.net
Address: 151.101.0.249
Name:	global.prod.fastly.net
Address: 151.101.64.249
Name:	global.prod.fastly.net
Address: 151.101.128.249
Name:	global.prod.fastly.net
Address: 151.101.192.249

This is the /etc/resolv.conf generated in the pod. not working

# cat /etc/resolv.conf 
nameserver 8.8.8.8
search default.svc.cluster.local svc.cluster.local cluster.local patrikdufresne.com
options ndots:5

# nslookup -debug dl-cdn.alpinelinux.org
Server:		8.8.8.8
Address:	8.8.8.8#53

------------
    QUESTIONS:
	dl-cdn.alpinelinux.org.default.svc.cluster.local, type = A, class = IN
    ANSWERS:
    AUTHORITY RECORDS:
    ->  .
	origin = a.root-servers.net
	mail addr = nstld.verisign-grs.com
	serial = 2017111401
	refresh = 1800
	retry = 900
	expire = 604800
	minimum = 86400
	ttl = 86385
    ADDITIONAL RECORDS:
------------
** server can't find dl-cdn.alpinelinux.org.default.svc.cluster.local: NXDOMAIN
Server:		8.8.8.8
Address:	8.8.8.8#53

------------
    QUESTIONS:
	dl-cdn.alpinelinux.org.svc.cluster.local, type = A, class = IN
    ANSWERS:
    AUTHORITY RECORDS:
    ->  .
	origin = a.root-servers.net
	mail addr = nstld.verisign-grs.com
	serial = 2017111401
	refresh = 1800
	retry = 900
	expire = 604800
	minimum = 86400
	ttl = 86394
    ADDITIONAL RECORDS:
------------
** server can't find dl-cdn.alpinelinux.org.svc.cluster.local: NXDOMAIN
Server:		8.8.8.8
Address:	8.8.8.8#53

------------
    QUESTIONS:
	dl-cdn.alpinelinux.org.cluster.local, type = A, class = IN
    ANSWERS:
    AUTHORITY RECORDS:
    ->  .
	origin = a.root-servers.net
	mail addr = nstld.verisign-grs.com
	serial = 2017111401
	refresh = 1800
	retry = 900
	expire = 604800
	minimum = 86400
	ttl = 86378
    ADDITIONAL RECORDS:
------------
** server can't find dl-cdn.alpinelinux.org.cluster.local: NXDOMAIN
Server:		8.8.8.8
Address:	8.8.8.8#53

------------
    QUESTIONS:
	dl-cdn.alpinelinux.org.patrikdufresne.com, type = A, class = IN
    ANSWERS:
    AUTHORITY RECORDS:
    ->  patrikdufresne.com
	origin = ns2.no-ip.com
	mail addr = hostmaster.no-ip.com
	serial = 2010091255
	refresh = 10800
	retry = 1800
	expire = 604800
	minimum = 1800
	ttl = 1799
    ADDITIONAL RECORDS:
------------
Non-authoritative answer:
*** Can't find dl-cdn.alpinelinux.org: No answer

If I remove my domain name patrikdufresne.com. working

# cat /etc/resolv.conf 
nameserver 8.8.8.8
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
root@tymara:/home/ikus060# nslookup dl-cdn.alpinelinux.org
Server:		8.8.8.8
Address:	8.8.8.8#53

Non-authoritative answer:
dl-cdn.alpinelinux.org	canonical name = global.prod.fastly.net.
Name:	global.prod.fastly.net
Address: 151.101.0.249
Name:	global.prod.fastly.net
Address: 151.101.64.249
Name:	global.prod.fastly.net
Address: 151.101.128.249
Name:	global.prod.fastly.net
Address: 151.101.192.249

Also working if I remove ndots:5.

# cat /etc/resolv.conf 
nameserver 8.8.8.8
search default.svc.cluster.local svc.cluster.local cluster.local patrikdufresne.com
root@tymara:/home/ikus060# nslookup dl-cdn.alpinelinux.org
Server:		8.8.8.8
Address:	8.8.8.8#53

Non-authoritative answer:
dl-cdn.alpinelinux.org	canonical name = global.prod.fastly.net.
Name:	global.prod.fastly.net
Address: 151.101.0.249
Name:	global.prod.fastly.net
Address: 151.101.64.249
Name:	global.prod.fastly.net
Address: 151.101.128.249
Name:	global.prod.fastly.net
Address: 151.101.192.249

The text was updated successfully, but these errors were encountered:

johnfosborneiii · 2018-02-13T02:26:06Z

I ran into this exact same issue with a fresh installation of OCP 3.7 on a RHEL 7.4 VM.

The outbound networking worked from the VM. The outbound networking also worked when I ran a container out of band from Kubernetes (using docker run). OCP ran the container, the outbound networking broke but it could be fixed by removing the options ndots:5 or "search josborne.com". I couldn't figure out where "search josborne.com" was even coming from because I didn't set that anywhere in the Ansible advanced installation. I changed my /etc/hostname file from openshift.josborne.com to openshift and rebooted. At that point "search josborne.com" was removed from the pod /etc/resolv.conf and everything started working. Is this user error or a bug? I've installed every release of OCP from scratch using a FQDN in my /etc/hostname file and it first broke in either 3.6 or 3.7 so I think something has changed in the platform.

danwinship · 2018-02-13T12:48:44Z

Right, so the problem is that if the domain that gets listed in the search line does wildcard matching, then because of the ndots:5, basically all hostnames will end up being treated as subdomains of the default domain. Eg, *.josbourne.com appears to resolve to a particular AWS hostname, so if you look up, say, github.com, it ends up matching as github.com.josbourne.com which resolves to the AWS IP.

I guess the search field in the pod resolv.conf is set automatically from the node hostname?

What we really want is to make service name lookups behave like ndots:5, but make other lookups not do that. We can't make the libc resolver do that, but in cases where we're running a DNS server inside the cluster, we could do the ndots-like special-casing inside that server, and then we could give the pods a resolv.conf without ndots.

The other possibility would be to stop including the node's domain in the pod resolv.conf's search field, but that would break any existing pods that were depending on the current behavior, so we'd need some sort of compatibility option.

ikus060 · 2018-02-13T12:54:37Z

Since the way to install openshift is to go with ansible playbook. I would add extra validation in ansible to make sure the provided DNS domain is behaving as you like. If not, the playbook should fail and warn the user.

openshift-bot · 2018-05-14T18:14:59Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2018-06-13T18:49:31Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

gbraad · 2018-07-08T11:59:55Z

This is still an issue.
/remove-lifecycle rotten

gbraad · 2018-07-08T12:05:56Z

For minishift this is an issue with some Hypervisor that force a search entry from the DHCP offer. Eg. HyperV on the "default switch" uses search mshome.net and can cause lookups during S2i to github.com to fail

gbraad · 2018-07-09T00:27:32Z

Note: the options ndots:5 is part of Kubernetes since about 2015 => kubernetes/kubernetes@23caf44#diff-0db82891d463ba14dd59da9c77f4776eR66 (ref: kubernetes/kubernetes#10266)

xpflying · 2018-09-16T16:05:30Z

Same issue with ansible install openshift 3.10

shadowlord017 · 2018-11-01T12:48:49Z

Same for me:
ndots:5 makes it substitute domain name (from search line) before checking original address

openshift-bot · 2019-01-30T16:05:24Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

danwinship · 2019-01-30T17:27:50Z

/remove-lifecycle stale
/lifecycle frozen

sponte · 2020-10-26T19:52:16Z

Hello, is there a workaround for this? I seem to be facing the same issue with k8s 1.19, coredns and my external domain which is part of the DNS search path, having wildcard match

pweil- added component/networking kind/bug Categorizes issue or PR as related to a bug. priority/P2 labels Nov 15, 2017

pweil- assigned knobunc Nov 15, 2017

danwinship changed the title ~~DNS resolution is failing~~ DNS resolution fails if default search domain has a wildcard match Feb 13, 2018

ich199 mentioned this issue Apr 10, 2018

service catalog install failed, may be have some prerequisites ? openshift/openshift-ansible#7611

Closed

ich199 mentioned this issue May 1, 2018

Service Catalog Install failed after the 120 attempts openshift/openshift-ansible#8195

Closed

sylus mentioned this issue May 8, 2018

Dns problems using custom VNET Azure/acs-engine#1603

Closed

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2018

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 13, 2018

InfoSec812 mentioned this issue Jul 8, 2018

Minishift DNS Issues When Host Has Search Domain Set minishift/minishift#2560

Closed

openshift-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 8, 2018

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2019

openshift-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 30, 2019

christian-kreuzberger-dtx mentioned this issue Jan 28, 2021

Integration Tests: QualityGates Integration Test fails on minishift keptn/keptn#2951

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS resolution fails if default search domain has a wildcard match #17316

DNS resolution fails if default search domain has a wildcard match #17316

ikus060 commented Nov 14, 2017

johnfosborneiii commented Feb 13, 2018

danwinship commented Feb 13, 2018

ikus060 commented Feb 13, 2018

openshift-bot commented May 14, 2018

openshift-bot commented Jun 13, 2018

gbraad commented Jul 8, 2018 •

edited

Loading

gbraad commented Jul 8, 2018

gbraad commented Jul 9, 2018 •

edited

Loading

xpflying commented Sep 16, 2018

shadowlord017 commented Nov 1, 2018

openshift-bot commented Jan 30, 2019

danwinship commented Jan 30, 2019

sponte commented Oct 26, 2020

DNS resolution fails if default search domain has a wildcard match #17316

DNS resolution fails if default search domain has a wildcard match #17316

Comments

ikus060 commented Nov 14, 2017

Version

Steps To Reproduce

johnfosborneiii commented Feb 13, 2018

danwinship commented Feb 13, 2018

ikus060 commented Feb 13, 2018

openshift-bot commented May 14, 2018

openshift-bot commented Jun 13, 2018

gbraad commented Jul 8, 2018 • edited Loading

gbraad commented Jul 8, 2018

gbraad commented Jul 9, 2018 • edited Loading

xpflying commented Sep 16, 2018

shadowlord017 commented Nov 1, 2018

openshift-bot commented Jan 30, 2019

danwinship commented Jan 30, 2019

sponte commented Oct 26, 2020

gbraad commented Jul 8, 2018 •

edited

Loading

gbraad commented Jul 9, 2018 •

edited

Loading