You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-network/20190415-Autopath API for clusterDNS.md
+13-9Lines changed: 13 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ approvers:
14
14
- "@bowei"
15
15
- "@johnbelamaric"
16
16
creation-date: 2019-04-15
17
-
last-updated: 2019-04-15
17
+
last-updated: 2019-10-28
18
18
status: provisional
19
19
---
20
20
# DNS Autopath in PodSpec
@@ -38,12 +38,13 @@ status: provisional
38
38
This proposal aims to minimize the number of parallel DNS queries generated by a pod by moving searchpath expansion logic to the DNS server side. This introduces a new dnsPolicy that can be configured on a per-pod basis. The metadata required to complete the searchpath expansion will be sent to the DNS Server via an EDNS0 option. This will be inserted into the request by the Nodelocal DNSCache.
39
39
40
40
## Motivation
41
-
DNS Search Path expansion on pods using ClusterFirst DNS mode can lead to DNS latency issues and race conditions due to several parallel dns queries from the same pod.
41
+
DNS Search Path expansion on pods using ClusterFirst DNS mode can lead to DNS latency issues and race conditions due to several parallel (The queries are sent in parallel in musl) dns queries from the same pod. Even in pods using glibc which sends these requests serially, the reduced load on client resolver and reduction in client latency is a big motivation to move this logic to the server-side.
42
42
The search path currently includes:
43
43
44
44
1. "$NS.svc.$SUFFIX"
45
45
2. "svc.$SUFFIX"
46
-
3. "$SUFFIX"
46
+
3. "$SUFFIX"
47
+
4. Host level suffixes, which might be 2 or 3 in number.
47
48
48
49
Where $NS stands for the namespace that the pod belongs to, $SUFFIX is the Kubernetes cluster suffix.
49
50
@@ -55,7 +56,7 @@ These search paths are set to make sure:
55
56
56
57
These searchpaths are included in pods' /etc/resolv.conf by kubelet and are enforced by setting ndots to 5. This means any hostname lookups with fewer than 5 dots will be expanded using all the search paths listed.
57
58
58
-
When pod issues a query to lookup hostname "service123", it is expanded to 4 queries - one for the original hostname and 3 with each of the searchpaths appended. Some resolvers issue both A and AAAA queries, so this can be a total of 8 or more queries for every single DNS lookup. When these queries are issued in parallel, they end up at the node with the same source tuple and need to be DNAT'ed increasing the chance of a [netfilter race condition](https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts).
59
+
When pod issues a query to lookup hostname "service123", it is expanded to 6 queries - one for the original hostname and one with each of the searchpaths appended. Some resolvers issue both A and AAAA queries, so this can be a total of 12 or more queries for every single DNS lookup. When these queries are issued in parallel, they end up at the node with the same source tuple and need to be DNAT'ed increasing the chance of a [netfilter race condition](https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts).
59
60
Even if one of the several queries fails due, the DNS lookup on the client side will fail after a 5s timeout.
60
61
61
62
### Goals
@@ -72,11 +73,11 @@ This proposal introduces a new dnsPolicy "clusterFirstWithAutopath". This can be
72
73
73
74
` dnsPolicy: clusterFirstWithAutopath`
74
75
75
-
Using this mode will set the searchpath on the pod as `search.$NS.$SUFFIX.k`, where $NS is the namespace of the pod and $SUFFIX is the cluster suffix. k is a one-letter suffix to identify the domain name to be a query that needs search expansion. Since there are [no single-letter TLD so far](http://data.iana.org/TLD/tlds-alpha-by-domain.txt), this suffix will help identify queries needing search expansion.
76
+
Using this mode will set the searchpath on the pod as `search.$NS.$SUFFIX.ap.k8s.io`, where $NS is the namespace of the pod and $SUFFIX is the cluster suffix. ap.k8s.io is the suffix to identify the domain name to be a query that needs search expansion.
76
77
77
78
This approach minimizes the number of DNS queries at client side to atmost 2(A, AAAA). The searchpath expansion logic moves to the server side. The server(clusterDNS - CoreDNS by default) will require additional metadata in order to complete searchpath expansion.
78
79
79
-
The metadata can be attached as an EDNS0 option. We can define a new EDNS0 option - SearchPaths, option code: 15 that is currently [unassigned.](https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-11)
80
+
The metadata can be attached as an EDNS0 option. We can define a new EDNS0 option - SearchPaths, option code: 15 that is currently [unassigned.](https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-11) We would need to reserve this or use an option number from the experimental range - 65001 to 65534.
80
81
The value of this option will be a comma-separated string consisting of all the searchpaths which are to be appended to the main query and looked up. This option can be useful outside of Kubernetes as well.
81
82
82
83
Instead of modifying all client pod images to insert an EDNS0 option in their requests, we will put this logic in [Nodelocal DNSCache](https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md), which is a daemonset running a per-node DNS Cache.
@@ -87,14 +88,17 @@ An optimization to consider: We can use the EDNS0 option to include a version nu
87
88
The clusterDNS service needs to support this new EDNS0 option and lookup multiple query names by expanding a single incoming query. This was tried on a test setup by [modifying the autopath plugin](https://github.com/coredns/coredns/compare/master...prameshj:auto) in CoreDNS to extract the searchpath from an EDNS0 option, for a proof of concept.
88
89
89
90
### Risks and Mitigations
90
-
1)Increases size of DNS requests due to the extra option. This can result in the query getting upgraded to TCP automatically. This is not an issue when using NodeLocal DNSCache which upgrades connections to TCP by default.
91
+
1)DNS resolution can break if NodeLocal DNSCache is down or if the pods point to the kube-dns service directly to resolve query names. This is because without the EDNS0 option, the custom searchpath is not resolvable by kube-dns/CoreDNS. Running 2 DNSCache instances would be necessary to keep searchpath expansion working during upgrades.
91
92
92
-
2) If the EDNS0 option is set and sent to a server that does not support the option, queries will fail. However, this mode is enabled in podSpec by the user and not turned on by default.
93
+
2) Increases size of DNS requests due to the extra option. This can result in the query getting upgraded to TCP automatically. This is not an issue when using NodeLocal DNSCache which upgrades connections to TCP by default.
94
+
95
+
3) If the EDNS0 option is set and sent to a server that does not support the option, queries will fail. However, this mode is enabled in podSpec by the user and not turned on by default.
93
96
94
97
## Implementation History
95
98
96
99
* 2019-04-15 - Creation of the KEP
97
100
98
101
## Alternatives [optional]
99
102
100
-
Use autopath plugin in CoreDNS and set a single searchpath in podSpec. This approach requires watching all pods to map the pod namespace and ip address of the pod. The namespeace of the pod can be determined from the source ip in the DNS request, as a result of this mapping. This additional watch can be resource intensive and also is a solution specific to CoreDNS.
103
+
* Use autopath plugin in CoreDNS and set a single searchpath in podSpec. This approach requires watching all pods to map the pod namespace and ip address of the pod. The namespeace of the pod can be determined from the source ip in the DNS request, as a result of this mapping. This additional watch can be resource intensive and also is a solution specific to CoreDNS.
104
+
* Autopath expansion similar to this proposal, but without nodeLocal DNSCache and without EDNS0. This works by sending the search.$NS.$SUFFIX.ap.k8s.io name directly to the server and having it understand that. However, it requires the DNS server to understand the custom searchpath and is 100% kubernetes specific. The advantage of this approach is that unavailability of NodeLocal DNSCache will not affect DNS resolution. The implementation of this functionality can be tied to the kubernetes/autopath plugins in CoreDNS.
0 commit comments