-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known #3417
Comments
Have you seen #2218 (comment) ? |
@tamalsaha Yes, I have seen it, but there has been posted only a workaround for the issue, not an actual fix. |
We have the same issue: Having error message from pod: Eventhough resolving works as expected: Answer is: Name: ext-nfs-svc.default.svc.cluster.local Using the ip for nfs connection works as described above. |
I suspect this is because NFS on the host system doesn't currently point to 10.96.0.10 within the guest VM - only within pods for what appears to be obsolete historical reasons. I could be completely wrong though. |
I guess you are right. Defining the IP for ext-nfs-svc.default.svc.cluster.local on the cluster-workers hosts file does solve the problem. Somehow it seems that the nfs mounting does not use the cluster internal dns resolution and also does not really use the external ip defined in the service. I'm not sure if this is the expected behaviour but to me it does not make much sense. |
👀 |
well, I'm running into the same issue on EKS as well. By defining the nfs server IP directly, it just works. Is it a known issue on EKS as well? or probably should I go to EFS on AWS? :( |
Apologies, I'm not a Minikube user but this is the most apt issue I've found for the problems that I'm having. I'm experiencing these exact problems:
Based on my googling efforts so far, this seems to be a Kubernetes issue where the NFS is being set up before the container can reach coredns. Perhaps an initialization order problem? |
The problem is that the components responsible for NFS storage backends do not use the cluster internal DNS but try to resolve the NFS server with the DNS information given on the worker node itself. One way to make this work would be to do a hosts-file entry on the worker nodes using (nfs-server.default.svc.cluster.local) and the nfs-server's ip address. But this is just a quick and dirty hack-around. But it's just odd that this component is not able to use the cluster internal DNS resolution. This would make much more sense and be more intuitive to use. |
I'm also having this issue on EKS. |
I don't think it's an issue related to any specific kubernetes cloud solution, but a general one. |
From what I can tell, the only solution to this would be to have the k8s node have access to k8s's coredns, which is responsible for resolving these names. However in my experience most k8s nodes use their own dns independent of k8s. |
@ikkerens I'm pretty sure that would work. Having an Ingress for the kube-dns service which is only reachable from the k8s-nodes itself could achieve this. But as you said, one would have to change the dns settings on the nodes. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
I have the same issue on AWS with an NFS server backed by an EBS disk. |
I'm running into the same issue. I can get it to work fine in GKE, won't work locally. |
Same issue on Azure AKS too. |
@fhaifler - With these configurations there is no data being shared between the pods. That is, anything inside the '/' is not visible inside the '/mnt' folder. Also, I'm not able to mount the '/nfs-data-example-folder' into '/mnt' folder. It throws permission error. |
@ramkrishnan8994 I am not sure I understand the question. Have you managed to make it work even with the domain name for nfs server (nfs-server.default.svc.cluster.local)? It is still not working for me even with updated minikube.
I am not sure what do you mean.
I don't know what |
This would likely be addressed by resolving #2162 (help wanted) |
I run into the same issue with Azure AKS but not with Google GKE. How come Google have a fix and not other cloud provider. |
This is a known issue in Kubernetes:
seen in #2162 (comment) |
ideas of a workaroundswrite /etc/hosts of all nodes (independent of distribution) or configure nodes to use cluster dns /etc/hosts manuallyManually write name of service in /etc/hosts on all nodes /etc/hosts partially automateddaemonset with an init container doing the update and rancher/pause as app container. /etc/hosts fully automatedWrite a controller which listens to all services (or only specially labeled services) and writes /etc/hosts on each host. See links in kubernetes/kubernetes#64623 (comment) resolve.conf manuallyUpdate resolv.conv manually on each node. Depending on the distributon (using systemd, ...), this may be different. Find the nameserver in /etc/resolv.conf of any pod. resolve.conf manuallydaemonset with an init container doing the update and rancher/pause as app container. The init container updates /to_edit/resolv.conv, which is mounted from host. No restart required. |
this has to be hardcoded because of kubernetes/minikube#3417
EDIT: Brian's solution, right below, is the best current solution. |
I was able to solve this problem by creating a service with a static clusterIP and then mounting to the IP instead of service name. No DNS required. This is working nicely on Azure. I haven't tried elsewhere In my case, I'm using an HDFS NFS Gateway and chose apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hdfs
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: Service
metadata:
name: hdfs-nfs
labels:
component: hdfs-nn
spec:
type: ClusterIP
clusterIP: 10.0.200.2
ports:
- name: portmapper
port: 111
protocol: TCP
- name: nfs
port: 2049
protocol: TCP
- name: mountd
port: 4242
protocol: TCP
selector:
component: hdfs-nn
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: hdfs
spec:
storageClassName: hdfs
capacity:
storage: 3000Gi
accessModes:
- ReadWriteMany
mountOptions:
- vers=3
- proto=tcp
- nolock
- noacl
- sync
nfs:
server: 10.0.200.2
path: "/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: hdfs
spec:
storageClassName: hdfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 3000Gi |
Would mounting it inside the container be an option? i.e traditional way of installing nfs-client in the container and using the mount command instead of letting the Kubernetes to mount it? |
@BrianHuf thanks for sharing your solution. Using minikube this works for us. Unfortunately without this method we just get the error as per the issue title. |
I'll leave this open with the workaround for discoverability, and in case we do ever fix it permanently in minikube. |
same when use csi-driver-nfs
|
@willzhang If you are using NFS CSI driver v4.1.0 or v4.0.0, try changing the |
For anyone else finding themselves in the same situation, who can't use the Installed the helm chart from their repo: helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --namespace kube-system --version v4.4.0 I'm running NFS inside my cluster using the apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-server
namespace: storage
spec:
replicas: 1
selector:
matchLabels:
role: nfs-server
template:
metadata:
labels:
role: nfs-server
spec:
containers:
- name: nfs-server
image: itsthenetwork/nfs-server-alpine:latest
ports:
- name: nfs
containerPort: 2049
securityContext:
privileged: true
volumeMounts:
- mountPath: /nfs
name: nfs-volume
env:
- name: SHARED_DIRECTORY
value: /nfs
volumes:
- name: nfs-volume
persistentVolumeClaim:
claimName: nfs-pvc
---
apiVersion: v1
kind: Service
metadata:
name: nfs-service
namespace: storage
spec:
ports:
- name: nfs
port: 2049
selector:
role: nfs-server
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pvc
namespace: storage
spec:
accessModes:
- ReadWriteOnce
storageClassName: gp2
resources:
requests:
storage: 2Gi Lastly, create the StorageClass, PVC, and Deployment that will mount your NFS share: apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
volumeMounts:
- name: nfs
mountPath: /usr/share/nginx/html
volumes:
- name: nfs
persistentVolumeClaim:
claimName: nfs-pvc-nginx
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-csi
provisioner: nfs.csi.k8s.io
parameters:
server: nfs-service.storage.svc.cluster.local
share: /
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pvc-nginx
spec:
accessModes:
- ReadWriteOnce
storageClassName: nfs-csi
resources:
requests:
storage: 1Gi |
This has returned in csi-driver-nfs v4.7.0 re-open? |
BUG REPORT
Environment:
Minikube version: v0.30.0
What happened:
NFS volume fails to mount due to DNS error (
Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known
). This problem does not occur when deployed on GKE.What you expected to happen:
NFS volume is mounted without an error.
How to reproduce it (as minimally and precisely as possible):
Output of
minikube logs
(if applicable):In
kubectl describe pod nfs-busybox-...
is this error:Which indicates problem with DNS resolution for
nfs-server.default.svc.cluster.local
.Anything else do we need to know:
The same problem was reported already for previous version #2218, but it is closed due to inactivity of the author and no-one seems to really looked into it. There is a workaround for this, but it is required to do it every time a minikube VM is created.
When running
kubectl exec -ti nfs-busybox-... -- nslookup nfs-server.default.svc.cluster.local
:Where strangely the service ClusterIP is present (when using kube-dns the service ClusterIP part is missing completely).
The text was updated successfully, but these errors were encountered: