Closed
Description
Hi,
We cannot access the Internet and Ambassador isn't working . Will these affect the use of tf-serving?
We use kubeadm1.9.1 set up kubernetes.
kubernetes
master iecas-30-6
slave iecas-30-7, iecas-30-8
NFS
server iecas-30-7
client iecas-30-6, iecas-30-8
There is the information of inception-nfs service.
kubectl get deployment inception-nfs -n kubeflow
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
inception-nfs 1 1 1 1 31m
kubectl get services -n kubeflow
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ambassador ClusterIP 10.99.217.131 <none> 80/TCP 32m
ambassador-admin ClusterIP 10.103.24.16 <none> 8877/TCP 32m
inception-nfs ClusterIP 10.105.9.96 <none> 9000/TCP,8000/TCP 32m
k8s-dashboard ClusterIP 10.111.23.158 <none> 443/TCP 32m
tf-hub-0 ClusterIP None <none> 8000/TCP 32m
tf-hub-lb ClusterIP 10.98.150.141 <none> 80/TCP 32m
tf-job-dashboard ClusterIP 10.110.154.14 <none> 80/TCP 32m
We can see the EXTERNAL-IP is none.
kubectl logs inception-nfs-657769bbd5-w4cv2 -n kubeflow
2018-05-21 18:42:33.129402: E tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:370] FileSystemStoragePathSource encountered a file-system access error: Could not find base path /mnt/var/nfs/general/inception for servable inception-nfs
The error is Could not find base path /mnt/var/nfs/general/inception
, but it exists in /var/nfs/general/inception
.
iecas@iecas-30-7: ll /var/nfs/general/
total 16
drwxr-xr-x 4 nobody nogroup 4096 5月 22 01:17 ./
drwxr-xr-x 3 root root 4096 3月 4 2016 ../
-rw-r--r-- 1 nobody nogroup 0 3月 4 2016 general.test
drwxr-xr-x 3 root root 4096 5月 22 01:17 inception/
drwxr-xr-x 2 root root 4096 3月 4 2016 pip/
kubectl describe pod inception-nfs-657769bbd5-w4cv2 -n kubeflow
Name: inception-nfs-657769bbd5-w4cv2
Namespace: kubeflow
Node: iecas-30-8/192.168.30.8
Start Time: Tue, 22 May 2018 02:03:44 +0800
Labels: app=inception-nfs
pod-template-hash=2133256681
Annotations: <none>
Status: Running
IP: 10.244.1.14
Controlled By: ReplicaSet/inception-nfs-657769bbd5
Containers:
inception-nfs:
Container ID: docker://5acfa1a67310929575ab65e89ca482106d088c9cf3ecee4e64710b26d538c930
Image: gcr.io/kubeflow-images-staging/tf-model-server-cpu:v20180327-995786ec
Image ID: docker://sha256:aeb4fbd2c5a15d0714054153556e6e445a1bbb8fcbac7b289467bb328025d9db
Port: 9000/TCP
Args:
/usr/bin/tensorflow_model_server
--port=9000
--model_name=inception-nfs
--model_base_path=/mnt/var/nfs/general/inception
State: Running
Started: Tue, 22 May 2018 02:07:11 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 4
memory: 4Gi
Requests:
cpu: 1
memory: 1Gi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kw2s8 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-kw2s8:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-kw2s8
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 44m kubelet, iecas-30-8 MountVolume.SetUp succeeded for volume "default-token-kw2s8"
Warning Failed 44m kubelet, iecas-30-8 Failed to pull image "gcr.io/kubeflow-images-staging/tf-model-server-cpu:v20180327-995786ec": rpc error: code = Unknown desc = Error response from daemon: Get https://gcr.io/v2/: dial tcp: lookup gcr.io on [::1]:53: read udp [::1]:51709->[::1]:53: read: connection refused
Warning Failed 43m kubelet, iecas-30-8 Failed to pull image "gcr.io/kubeflow-images-staging/tf-model-server-cpu:v20180327-995786ec": rpc error: code = Unknown desc = Error response from daemon: Get https://gcr.io/v2/: dial tcp: lookup gcr.io on [::1]:53: read udp [::1]:53334->[::1]:53: read: connection refused
Warning Failed 43m kubelet, iecas-30-8 Failed to pull image "gcr.io/kubeflow-images-staging/tf-model-server-cpu:v20180327-995786ec": rpc error: code = Unknown desc = Error response from daemon: Get https://gcr.io/v2/: dial tcp: lookup gcr.io on [::1]:53: read udp [::1]:34904->[::1]:53: read: connection refused
Warning Failed 42m (x4 over 44m) kubelet, iecas-30-8 Error: ErrImagePull
Normal Pulling 42m (x4 over 44m) kubelet, iecas-30-8 pulling image "gcr.io/kubeflow-images-staging/tf-model-server-cpu:v20180327-995786ec"
Warning Failed 42m kubelet, iecas-30-8 Failed to pull image "gcr.io/kubeflow-images-staging/tf-model-server-cpu:v20180327-995786ec": rpc error: code = Unknown desc = Error response from daemon: Get https://gcr.io/v2/: dial tcp: lookup gcr.io on [::1]:53: read udp [::1]:60235->[::1]:53: read: connection refused
Normal BackOff 42m (x6 over 44m) kubelet, iecas-30-8 Back-off pulling image "gcr.io/kubeflow-images-staging/tf-model-server-cpu:v20180327-995786ec"
Warning Failed 42m (x6 over 44m) kubelet, iecas-30-8 Error: ImagePullBackOff
Normal Scheduled 41m default-scheduler Successfully assigned inception-nfs-657769bbd5-w4cv2 to iecas-30-8
iecas@iecas-30-6: kubectl edit service tf-job-dashboard -n kubeflow
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: Service
metadata:
annotations:
getambassador.io/config: |-
---
apiVersion: ambassador/v0
kind: Mapping
name: tfjobs-ui-mapping
prefix: /tfjobs/
rewrite: /tfjobs/
service: tf-job-dashboard.kubeflow
creationTimestamp: 2018-05-21T18:05:41Z
name: tf-job-dashboard
namespace: kubeflow
resourceVersion: "1750"
selfLink: /api/v1/namespaces/kubeflow/services/tf-job-dashboard
uid: 9320f5b3-5d21-11e8-9f7b-a0423f2e7641
spec:
clusterIP: 10.110.154.14
ports:
- port: 80
protocol: TCP
targetPort: 8080
selector:
name: tf-job-dashboard
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
iecas@iecas-30-6:~/Documents/kubeflow/code/my-kubeflow$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
iecas-30-6 Ready master 54m v1.9.1
iecas-30-7 Ready <none> 49m v1.9.1
iecas-30-8 Ready <none> 51m v1.9.1
iecas@iecas-30-6:~/Documents/kubeflow/code/my-kubeflow$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-iecas-30-6 1/1 Running 0 53m
kube-system kube-apiserver-iecas-30-6 1/1 Running 0 53m
kube-system kube-controller-manager-iecas-30-6 1/1 Running 0 53m
kube-system kube-dns-6f4fd4bdf-lbg2w 3/3 Running 0 54m
kube-system kube-flannel-ds-8nzkh 1/1 Running 0 52m
kube-system kube-flannel-ds-f4q5h 1/1 Running 0 51m
kube-system kube-flannel-ds-hg449 1/1 Running 0 50m
kube-system kube-proxy-dfgtr 1/1 Running 0 51m
kube-system kube-proxy-nfqtb 1/1 Running 0 50m
kube-system kube-proxy-xdx2t 1/1 Running 0 54m
kube-system kube-scheduler-iecas-30-6 1/1 Running 0 53m
kube-system nvidia-device-plugin-daemonset-h87m9 1/1 Running 0 50m
kube-system nvidia-device-plugin-daemonset-mpvzg 1/1 Running 0 50m
kubeflow ambassador-64dcb6694f-qnvvk 1/2 CrashLoopBackOff 11 38m
kubeflow ambassador-6dffffbc5c-9vb59 1/2 CrashLoopBackOff 11 37m
kubeflow ambassador-6dffffbc5c-qh2qj 1/2 CrashLoopBackOff 5 6m
kubeflow ambassador-6dffffbc5c-w2gk9 1/2 CrashLoopBackOff 11 37m
kubeflow inception-nfs-657769bbd5-w4cv2 1/1 Running 0 37m
kubeflow spartakus-volunteer-66564f9679-s4gjn 1/1 Running 0 37m
kubeflow tf-hub-0 1/1 Running 0 37m
kubeflow tf-job-dashboard-7d48f6456c-hd6n8 1/1 Running 0 38m
kubeflow tf-job-operator-68cd79c8b5-rpxlp 1/1 Running 0 38m
Thanks!
Metadata
Metadata
Assignees
Labels
No labels