-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoreDNS fails on minions on multi-node clusters. Can't resolve external DNS from non-master pods. #8055
Comments
@aasmall thank you for bringing this to our attention, I have a few questions I assume it doesn't happen for normal docker runtime no cni secnarios ? multi node is experimental at the moment but we have WIP PRs that would remove the need for flannel . |
HEAD should no longer need flannel at all, we should automatically apply CNI for multinode |
@sharifelgamal - Thank you. I'll validate in a spell. Busy working on the actual app rn, though I AM having a lot of fun playing with minikube. |
Not sure this is related or not, But I experienced dns failing on the minions. |
Tried disabling kindnet so I could add my own driver:
Not sure how to disable kindnet. |
I checked connectivity in the pods via launching a pod on each node and trying to connect to each other with nc. workers work. master connectivity is not. I deleted the coredns pods and they restarted on the non master nodes. and dns started working. So something is not working with kindnet on the master. |
there seems to be a difference between the master and the workers. Not sure its relevant though:
|
I'm going to take a look into this today. |
I think this is a bug, but at the same time, I think it should be a fairly rare bug to run into, at least with the current state of minikube: minikube will only deploy CoreDNS to the master pod by default. Even if you scale the deployment to 30 replicas. I do see now that it does not appear to be possible to select CNI's in multi-node (kind is applied by default): That will be fixed by #8222 - probably by adding a flag like |
I scaled minikube up to 150 DNS replicas in order to get it scaled across the 3 nodes, and had no issue with pods crashing or not resolving records. I wonder if we accidentally fixed this due to applying a default CNI.
I will revisit this once I'm able to disable kindnet as part of #8222 |
My tests were based on https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution. My env Scenario 1minikube start -p dns --cpus=2 --memory=2g --nodes=2 --driver=kvm2 --extra-config=kubelet.resolv-conf=/run/systemd/resolve/resolv.conf
Scenario 2minikube start -p dns --cpus=2 --memory=2g --nodes=2 --driver=kvm2 --enable-default-cni=false --network-plugin=cni
Conclusion:
|
After testing, I can confirm that resolution of Kubernetes hosts from non-master pods is broken. I was not able to replicate issues with DNS resolution, however. In a nutshell, I believe that the issue of CoreDNS access from non-master nodes is a sign of a broken CNI configuration. I'll continue to investigate. |
Unfortunately the issue is still there:
|
I also still see problems with multi node clusters and kvm2. This happens on first creation of the cluster but also on restarting of the cluster. Here u see the logs when I restart a 3 node cluster.
CoreDNS pod is running but the problem seems that its started too early.
After restarting the CoreDNS pod, there are no more erros visible in the logs and DNS starts working.
@tstromberg can we reopen this issue or create a new one for it? |
Same issue here, why the issue is very easy to reproduce but closed... |
This issue seems to be closed by mistake. If I understood correctly @tstromberg wrote "Does not fix #.." in his PR and issue got closed automatically w/o taking "Does not" part into consideration :) btw I can confirm that the issue persists on latest MacOS and minikube v1.17.1 (latest), when I run it like this: DNS resolves fine inside minikube nodes, but containers fail to resolve. |
After some work on this yesterday, I can confirm there was a bug with kube-proxy starting up and a merge request to fix it has been submitted. See #10581 Now that it is fixed, I can run |
Okay, I think I've figured something out. I'm going to open a new ticket. This is all based on problems in the iptables. I'll add a link to the new ticket when I get it put together. |
I've figured out my problem. I was trying to use kubrenetes 1.16 with multinode. 1.16 ends up putting the docker ip address range of 172.17.0.1/16 in the kube-proxy controlled ip tables. The 172 range of addresses are not exposed when run in the docker driver. If I upgrade to kubernetes 1.20.1, then the problem goes away as the iptables use the 192.168.49.0 addresses. |
So it appears that the iptables inside the nodes are based on the 172 addresses until 1.20. So multinode will not work with any other versions without some work. |
Is that problem specific then to the docker driver or would the kvm2 driver have it too? Wondering if there are multiple problems here or not. |
Just ran into this issue as well, solved by the kubectl patch from the original post.
|
So, I already fixed this and lost some of the logs. But it's pretty straight-forward.
n.b. I built from head a couple days ago
curl google.com
curl google.com
CoreDNS was crashing per kubernetes/kubernetes#75414
Fixed with
kubectl patch deployment coredns -n kube-system --patch '{"spec":{"template":{"spec":{"volumes":[{"name":"emptydir-tmp","emptyDir":{}}],"containers":[{"name":"coredns","volumeMounts":[{"name":"emptydir-tmp","mountPath":"/tmp"}]}]}}}}'
Edit: had wrong flannel yaml listed.
The text was updated successfully, but these errors were encountered: