Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico + CIDR changes result in non functioning DNS without FQDN #9881

Closed
acabrele opened this issue Dec 8, 2020 · 3 comments · Fixed by #10049
Closed

Calico + CIDR changes result in non functioning DNS without FQDN #9881

acabrele opened this issue Dec 8, 2020 · 3 comments · Fixed by #10049
Labels
area/cni CNI support kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@acabrele
Copy link

acabrele commented Dec 8, 2020

Steps to reproduce the issue:

minikube start
--driver='virtualbox'
--profile=cluster0
--cpus=4
--memory=4096
--cni='calico'
--extra-config=kubeadm.pod-network-cidr=10.0.1.0/24
--service-cluster-ip-range='10.0.2.0/24'
--dns-domain=cluster0.local

Full output of failed command:

Quick easy test:

curl kube-dns:53
curl: (6) Could not resolve host: kube-dns

If a FQDN is used, I see what I would expect:

curl kube-dns.kube-system.svc.cluster0.local:53
curl: (52) Empty reply from server

kubedns looks OK here, all system pods are running too:

kubectl --context cluster0 get svc -n kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.0.2.10    <none>        53/UDP,53/TCP,9153/TCP   55m

But on the node (minikube ssh), resolv.conf for some reason has a .3 address:

cat /etc/resolv.conf | grep nameserver
nameserver 10.0.2.3

I tried to override with --extra-config=kubelet.cluster-dns but did not work.

Full output of minikube start command used, if not already included:
I have also tried this with v1.15

😄 [cluster0] minikube v1.14.2 on Darwin 10.14.6
✨ Using the virtualbox driver based on user configuration
👍 Starting control plane node cluster0 in cluster cluster0
🔥 Creating virtualbox VM (CPUs=4, Memory=4096MB, Disk=20000MB) ...
❗ This VM is having trouble accessing https://k8s.gcr.io
💡 To pull new external images, you may need to configure a proxy: https://minikube.sigs.k8s.io/docs/reference/networking/proxy/
🐳 Preparing Kubernetes v1.19.2 on Docker 19.03.12 ...
▪ kubeadm.pod-network-cidr=10.0.1.0/24
🔗 Configuring Calico (Container Networking Interface) ...
🔎 Verifying Kubernetes components...
🌟 Enabled addons: storage-provisioner, default-storageclass

❗ /usr/local/bin/kubectl is version 1.16.2, which may have incompatibilites with Kubernetes 1.19.2.
💡 Want kubectl v1.19.2? Try 'minikube kubectl -- get pods -A'
🏄 Done! kubectl is now configured to use "cluster0" by default

@tstromberg tstromberg added kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. area/cni CNI support labels Dec 10, 2020
@sadlil
Copy link
Contributor

sadlil commented Dec 13, 2020

@acabrele One quick question, where were the curl ran? Is it in a pod or the minikube nodes? I am guessing it should be pod. If it was ran on a pod can you try curl kube-dns.kube-system and see what happens. Also is the pod in the same namespace as kube-dns, in kube-system?

Additionally is this behaviour same with other application services?

@acabrele
Copy link
Author

@sadlil the curl was run from an alpine based pod I was running.

I forgot to mention I had tried kube-dns.kube-system.svc too. curl of anything less than the FQDN did not work.

I have also tried a simple http server released to default namespace with a service (e.g kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.9/examples/kubernetes/clustermesh/global-service-example/cluster1.yaml)

# curl rebel-base
curl: (6) Could not resolve host: rebel-base
# curl rebel-base.default.svc
curl: (6) Could not resolve host: rebel-base.default.svc
# curl rebel-base.default.svc.cluster0.local
{"Galaxy": "Alderaan", "Cluster": "Cluster-1"}

Only FQDN works but it is also very slow

@sadlil
Copy link
Contributor

sadlil commented Dec 26, 2020

Found out its not an issue with calico or other CNIs, its being caused by a Kubelet misconfiguration. #10049 Please try out the solution from there and let me know if this resolves your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cni CNI support kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants