KEP for graduating nodelocaldns to beta #1005

prameshj · 2019-04-26T06:15:32Z

thockin · 2019-04-26T22:22:13Z

keps/sig-network/20190424-NodeLocalDNS-beta-proposal.md

+
+N.B. Although CoreDNS is now the default DNS server on Kubernetes clusters, this document still uses the name kube-dns since the service name is still the same.
+
+Based on the initial feedback for NodeLocal DNSCache feature, HA seems to be the common ask. 


I'm skeptical. I 100% agree that it is a little different than kubelet or kube-proxy (this is "data plane" vs "control plane"). That said, this proposal is kind of going to heroic lengths to get HA within a single failure-domain. We have a number of node-agents and they are all, more or less, subject to this problem.

What you propose here only reduces the outage window, and carries its own risks. We know that running iptables can consume a lot of memory (yay iptables), which means we have to set aside a significant amount of RAM for this, or it will actually make the system less stable (OOM).

This also removes our ability to do anything "clever" in the cache, such as the autopath proposal. It requires that the pod <-> cache protocol be the same as cache <-> upstream. This makes me sad, because I like the autopath proposal.

Additionally this doesn't work for IPVS, which is a major problem, and it sounds like we don't have a good answer.

Given all this, can we instead do something like SO_REUSEPORT ? If we ensure the cache is SO_REUSEPORT enabled, users who want HA can run 2 copies of the cache (HA, for the low, low price of 2x?), and users who don't care can just ... not ?

Do we need HA for Beta? I propose we push it beyond beta. It's way more valuable to get this into more hands ASAP, IMO.

Put another way - HA capability sounds great, but we should not block Beta on it.

Not a hard requirement, I put it as a criterion since it was the most asked question about the feature. I am ok with decoupling this with the Beta Graduation criteria and continue working on a solution.

Do we need HA for Beta?

Does beta mean this feature is enabled by default in new clusters?
Can this be "backported" or will be available only in newer versions?

feature will not be enabled by default on new clusters. It can be backported to older versions.

As it is not a bugfix, it probably wouldn't make sense to backport. However, given it has almost no dependencies, we can give users a very easy way to configure their cluster to use it.

szuecs · 2019-04-27T09:50:53Z

keps/sig-network/20190424-NodeLocalDNS-beta-proposal.md

+
+1) Pod Evicted - We create this daemonset with `priorityClassName: system-node-critical` setting to greatly reduce the likelihood of eviction.
+2) Config error - node-local-dns restarts repeatedly due to incorrect config. This will be resolved only when the config error has been fixed. There will be DNS downtime until then, even though the kube-dns pods might be available.
+3) OOMKilled - node-local-dns gets OOMKilled due to its own memory usage or some other component using up all memory resources on the node. There is a chance this will cause other disruptions on the node in addition to DNS downtime though. 


I would like to see some considerations in which number of object types local-dns-cache is memory bound to. This is valuable information for cluster operations.
Best would be to add some measurements.

Are you asking what components of local-dns-cache take up most of the memory?

I had done some measurements of max-memory usage by the node-local-dns pod when running some dnsperf tests - https://github.com/kubernetes/perf-tests/tree/master/dns. This brought up a single pod running dnsperf and measured memory usage of node-local-dns to be 20 Mi.

I also ran the image you referenced in issue coredns/coredns#2593; mikkeloscar/go-dnsperf:latest with the ubuntu yaml with rps - 10k. I ran 3 replicas in one node and 2 in another. Node-local-dns pods serving those pods had memory usage ~30Mi which is the default limit in node-local-dns yaml.

justaugustus · 2019-04-28T03:37:58Z

/assign @thockin @bowei @johnbelamaric

thockin

I am approving now, to get it in, but we should revisit the details of HA.

Specifically we have called out 2 possible HA modes, which warrant testing and documenting (if they work well) and some small affordances:

Flag: Listen on a second IP; don't manage the NOTRACK rule in the cache; allow an external agent to do that.
Flag: enable SO_REUSEPORT; allow a second daemonset per node.

Thanks!

/lgtm
/approve

k8s-ci-robot · 2019-04-29T23:45:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: prameshj, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-network/OWNERS~~ [thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

BSWANG · 2019-07-17T07:58:11Z

@prameshj @thockin
In IPVS backend cluster. Can we use libpcap to capture the kube-dns service's udp request to nodelocaldns? And drop the kube-dns endpoints on node output. Then nodelocaldns directly reply dns response to pod? The dns package flow like this diagram:

k8s-ci-robot requested review from caseydavenport and dcbw April 26, 2019 06:15

KEP for graduating nodelocaldns to beta

8407130

prameshj force-pushed the nodelocal-beta branch from 8f8e9f5 to 8407130 Compare April 26, 2019 06:30

prameshj mentioned this pull request Apr 26, 2019

Add a KEP for DNS Autopath in pod API #967

Closed

thockin reviewed Apr 26, 2019

View reviewed changes

szuecs reviewed Apr 27, 2019

View reviewed changes

k8s-ci-robot assigned bowei, johnbelamaric and thockin Apr 28, 2019

prameshj force-pushed the nodelocal-beta branch 2 times, most recently from b2883d1 to 6bbf8a6 Compare April 29, 2019 22:05

Review comments, modified criteria.

3877b49

prameshj force-pushed the nodelocal-beta branch from 6bbf8a6 to 3877b49 Compare April 29, 2019 22:14

thockin reviewed Apr 29, 2019

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 29, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2019

k8s-ci-robot merged commit 9891db3 into kubernetes:master Apr 29, 2019

realdimas mentioned this pull request Jul 16, 2019

DNS intermittent delays of 5s kubernetes/kubernetes#56903

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KEP for graduating nodelocaldns to beta #1005

KEP for graduating nodelocaldns to beta #1005

Uh oh!

prameshj commented Apr 26, 2019

Uh oh!

thockin Apr 26, 2019 •

edited

Loading

Uh oh!

thockin Apr 26, 2019

Uh oh!

thockin Apr 26, 2019

Uh oh!

prameshj Apr 26, 2019

Uh oh!

aledbf Apr 26, 2019

Uh oh!

prameshj Apr 26, 2019

Uh oh!

bowei Apr 26, 2019

Uh oh!

szuecs Apr 27, 2019

Uh oh!

prameshj Apr 29, 2019 •

edited

Loading

Uh oh!

justaugustus commented Apr 28, 2019

Uh oh!

thockin left a comment

Uh oh!

k8s-ci-robot commented Apr 29, 2019

Uh oh!

BSWANG commented Jul 17, 2019 •

edited

Loading

Uh oh!

Uh oh!


		N.B. Although CoreDNS is now the default DNS server on Kubernetes clusters, this document still uses the name kube-dns since the service name is still the same.

		Based on the initial feedback for NodeLocal DNSCache feature, HA seems to be the common ask.

KEP for graduating nodelocaldns to beta #1005

KEP for graduating nodelocaldns to beta #1005

Uh oh!

Conversation

prameshj commented Apr 26, 2019

Uh oh!

thockin Apr 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thockin Apr 26, 2019

Choose a reason for hiding this comment

Uh oh!

thockin Apr 26, 2019

Choose a reason for hiding this comment

Uh oh!

prameshj Apr 26, 2019

Choose a reason for hiding this comment

Uh oh!

aledbf Apr 26, 2019

Choose a reason for hiding this comment

Uh oh!

prameshj Apr 26, 2019

Choose a reason for hiding this comment

Uh oh!

bowei Apr 26, 2019

Choose a reason for hiding this comment

Uh oh!

szuecs Apr 27, 2019

Choose a reason for hiding this comment

Uh oh!

prameshj Apr 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justaugustus commented Apr 28, 2019

Uh oh!

thockin left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Apr 29, 2019

Uh oh!

BSWANG commented Jul 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

thockin Apr 26, 2019 •

edited

Loading

prameshj Apr 29, 2019 •

edited

Loading

BSWANG commented Jul 17, 2019 •

edited

Loading