This repository was archived by the owner on Nov 1, 2022. It is now read-only.
This repository was archived by the owner on Nov 1, 2022. It is now read-only.
fluxctl cannot communicate with fluxd after pod rescheduling due to ambiguous pod name #2276
Closed
Description
Describe the bug
When the node running the flux container fails, kubernetes will eventually reschedule the pod onto a different node.
Until the failed node comes back online and the old flux pod is cleaned up, there will be two pods, one with state Running
and one with state Terminating
:
➔ kubectl get pods -l name=flux
NAME READY STATUS RESTARTS AGE
flux-66966f499-hrplt 1/1 Terminating 0 143m
flux-66966f499-rpql2 1/1 Running 0 13m
While this situation exists, fluxctl cannot be used due to an ambiguous pod specification:
➔ fluxctl list-workloads
Error: Could not create a dialer: Could not get pod name: Ambiguous pod: found more than one pod for selector: labels "name in (flux,fluxd,weave-flux-agent)"
Run 'fluxctl list-workloads --help' for usage.
To Reproduce
- Set up a two-node Kubernetes cluster with flux installed following the "Getting started" documentation.
- Pause/stop/terminate the node which has the flux daemon scheduled.
kubectl get nodes
should then look something like:
NAME STATUS ROLES AGE VERSION
aks-agentpool-57623730-0 NotReady agent 4h8m v1.13.7
aks-agentpool-57623730-1 Ready agent 4h8m v1.13.7
- Wait for kubernetes to reschedule the flux pod onto the other node, at which point two pods should show up in
kubectl get pods -l name=flux
:
➔ kubectl get pods -l name=flux
NAME READY STATUS RESTARTS AGE
flux-66966f499-hrplt 1/1 Terminating 0 143m
flux-66966f499-rpql2 1/1 Running 0 13m
- Invoking fluxctl commands, for example
fluxctl list-workloads
, will now return an error.
Expected behavior
fluxctl should target the running pod and continue work as usual.
Additional context
- Flux version: 1.13.2
- Kubernetes version: 1.13.7
Activity