Skip to content

Commit 9f5f2c4

Browse files
committed
Incorporated review comments
1 parent bcf0ac7 commit 9f5f2c4

File tree

1 file changed

+13
-9
lines changed

1 file changed

+13
-9
lines changed

keps/sig-network/20190415-Autopath API for clusterDNS.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ approvers:
1414
- "@bowei"
1515
- "@johnbelamaric"
1616
creation-date: 2019-04-15
17-
last-updated: 2019-04-15
17+
last-updated: 2019-10-28
1818
status: provisional
1919
---
2020
# DNS Autopath in PodSpec
@@ -38,12 +38,13 @@ status: provisional
3838
This proposal aims to minimize the number of parallel DNS queries generated by a pod by moving searchpath expansion logic to the DNS server side. This introduces a new dnsPolicy that can be configured on a per-pod basis. The metadata required to complete the searchpath expansion will be sent to the DNS Server via an EDNS0 option. This will be inserted into the request by the Nodelocal DNSCache.
3939

4040
## Motivation
41-
DNS Search Path expansion on pods using ClusterFirst DNS mode can lead to DNS latency issues and race conditions due to several parallel dns queries from the same pod.
41+
DNS Search Path expansion on pods using ClusterFirst DNS mode can lead to DNS latency issues and race conditions due to several parallel (The queries are sent in parallel in musl) dns queries from the same pod. Even in pods using glibc which sends these requests serially, the reduced load on client resolver and reduction in client latency is a big motivation to move this logic to the server-side.
4242
The search path currently includes:
4343

4444
1. "$NS.svc.$SUFFIX"
4545
2. "svc.$SUFFIX"
46-
3. "$SUFFIX"
46+
3. "$SUFFIX"
47+
4. Host level suffixes, which might be 2 or 3 in number.
4748

4849
Where $NS stands for the namespace that the pod belongs to, $SUFFIX is the Kubernetes cluster suffix.
4950

@@ -55,7 +56,7 @@ These search paths are set to make sure:
5556

5657
These searchpaths are included in pods' /etc/resolv.conf by kubelet and are enforced by setting ndots to 5. This means any hostname lookups with fewer than 5 dots will be expanded using all the search paths listed.
5758

58-
When pod issues a query to lookup hostname "service123", it is expanded to 4 queries - one for the original hostname and 3 with each of the searchpaths appended. Some resolvers issue both A and AAAA queries, so this can be a total of 8 or more queries for every single DNS lookup. When these queries are issued in parallel, they end up at the node with the same source tuple and need to be DNAT'ed increasing the chance of a [netfilter race condition](https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts).
59+
When pod issues a query to lookup hostname "service123", it is expanded to 6 queries - one for the original hostname and one with each of the searchpaths appended. Some resolvers issue both A and AAAA queries, so this can be a total of 12 or more queries for every single DNS lookup. When these queries are issued in parallel, they end up at the node with the same source tuple and need to be DNAT'ed increasing the chance of a [netfilter race condition](https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts).
5960
Even if one of the several queries fails due, the DNS lookup on the client side will fail after a 5s timeout.
6061

6162
### Goals
@@ -72,11 +73,11 @@ This proposal introduces a new dnsPolicy "clusterFirstWithAutopath". This can be
7273

7374
` dnsPolicy: clusterFirstWithAutopath`
7475

75-
Using this mode will set the searchpath on the pod as `search.$NS.$SUFFIX.k`, where $NS is the namespace of the pod and $SUFFIX is the cluster suffix. k is a one-letter suffix to identify the domain name to be a query that needs search expansion. Since there are [no single-letter TLD so far](http://data.iana.org/TLD/tlds-alpha-by-domain.txt), this suffix will help identify queries needing search expansion.
76+
Using this mode will set the searchpath on the pod as `search.$NS.$SUFFIX.ap.k8s.io`, where $NS is the namespace of the pod and $SUFFIX is the cluster suffix. ap.k8s.io is the suffix to identify the domain name to be a query that needs search expansion.
7677

7778
This approach minimizes the number of DNS queries at client side to atmost 2(A, AAAA). The searchpath expansion logic moves to the server side. The server(clusterDNS - CoreDNS by default) will require additional metadata in order to complete searchpath expansion.
7879

79-
The metadata can be attached as an EDNS0 option. We can define a new EDNS0 option - SearchPaths, option code: 15 that is currently [unassigned.](https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-11)
80+
The metadata can be attached as an EDNS0 option. We can define a new EDNS0 option - SearchPaths, option code: 15 that is currently [unassigned.](https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-11) We would need to reserve this or use an option number from the experimental range - 65001 to 65534.
8081
The value of this option will be a comma-separated string consisting of all the searchpaths which are to be appended to the main query and looked up. This option can be useful outside of Kubernetes as well.
8182

8283
Instead of modifying all client pod images to insert an EDNS0 option in their requests, we will put this logic in [Nodelocal DNSCache](https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md), which is a daemonset running a per-node DNS Cache.
@@ -87,14 +88,17 @@ An optimization to consider: We can use the EDNS0 option to include a version nu
8788
The clusterDNS service needs to support this new EDNS0 option and lookup multiple query names by expanding a single incoming query. This was tried on a test setup by [modifying the autopath plugin](https://github.com/coredns/coredns/compare/master...prameshj:auto) in CoreDNS to extract the searchpath from an EDNS0 option, for a proof of concept.
8889

8990
### Risks and Mitigations
90-
1) Increases size of DNS requests due to the extra option. This can result in the query getting upgraded to TCP automatically. This is not an issue when using NodeLocal DNSCache which upgrades connections to TCP by default.
91+
1) DNS resolution can break if NodeLocal DNSCache is down or if the pods point to the kube-dns service directly to resolve query names. This is because without the EDNS0 option, the custom searchpath is not resolvable by kube-dns/CoreDNS. Running 2 DNSCache instances would be necessary to keep searchpath expansion working during upgrades.
9192

92-
2) If the EDNS0 option is set and sent to a server that does not support the option, queries will fail. However, this mode is enabled in podSpec by the user and not turned on by default.
93+
2) Increases size of DNS requests due to the extra option. This can result in the query getting upgraded to TCP automatically. This is not an issue when using NodeLocal DNSCache which upgrades connections to TCP by default.
94+
95+
3) If the EDNS0 option is set and sent to a server that does not support the option, queries will fail. However, this mode is enabled in podSpec by the user and not turned on by default.
9396

9497
## Implementation History
9598

9699
* 2019-04-15 - Creation of the KEP
97100

98101
## Alternatives [optional]
99102

100-
Use autopath plugin in CoreDNS and set a single searchpath in podSpec. This approach requires watching all pods to map the pod namespace and ip address of the pod. The namespeace of the pod can be determined from the source ip in the DNS request, as a result of this mapping. This additional watch can be resource intensive and also is a solution specific to CoreDNS.
103+
* Use autopath plugin in CoreDNS and set a single searchpath in podSpec. This approach requires watching all pods to map the pod namespace and ip address of the pod. The namespeace of the pod can be determined from the source ip in the DNS request, as a result of this mapping. This additional watch can be resource intensive and also is a solution specific to CoreDNS.
104+
* Autopath expansion similar to this proposal, but without nodeLocal DNSCache and without EDNS0. This works by sending the search.$NS.$SUFFIX.ap.k8s.io name directly to the server and having it understand that. However, it requires the DNS server to understand the custom searchpath and is 100% kubernetes specific. The advantage of this approach is that unavailability of NodeLocal DNSCache will not affect DNS resolution. The implementation of this functionality can be tied to the kubernetes/autopath plugins in CoreDNS.

0 commit comments

Comments
 (0)