Skip to content
This repository has been archived by the owner on Jul 30, 2021. It is now read-only.

Kubernetes v1.12.x doesn't restore pod-checkpointer #1001

Closed
2 tasks done
dghubble opened this issue Oct 10, 2018 · 11 comments
Closed
2 tasks done

Kubernetes v1.12.x doesn't restore pod-checkpointer #1001

dghubble opened this issue Oct 10, 2018 · 11 comments

Comments

@dghubble
Copy link
Contributor

Kubernetes v1.12.x doesn't currently work with the pod-checkpointer. In my exploration so far, bootstrapping a v1.12.1 cluster succeeds (workaround one known issue) and the pod-checkpointer checkpoints itself to /etc/kubernetes/manifests and moves the apiserver checkpoint to inactive. Normal so far.

For sanity sake, the following work alright as well:

  • Deleting the checkpointed manifest, it gets restored by the running pod checkpointer
  • Deleting the checkpoint pod, it gets recreated (I believe from the manifest)

In the past, the "checkpoint" meant there was a 2nd pod running in a typical cluster.

kube-system          pod-checkpointer-2kflw                                               (from DaemonSet)                                  
kube-system          pod-checkpointer-2kflw-some-controller-node-name  (Pod with "checkpoint-of")

Starting in v1.12, only pod-checkpointer-2kflw exists. With verbosity turned up, the Kubelet on the controller continuously reports that:

Static pod "1b54a6b84faeeb51d981ca0b8930e18d" (pod-checkpointer-2kflw-some-controller-node-name/kube-system) does not have a corresponding mirror pod; skipping

This becomes a serious issue when power cycling the cluster. The Kubelet starts, reads static manifests from /etc/kubernetes/manifests (containing the checkpointed pod-checkpointer), and logs that its skipping creating the pod-checkpointer. As a result, the cluster does not return.

Static pod "1b54a6b84faeeb51d981ca0b8930e18d" (pod-checkpointer-2kflw-some-controller-node-name/kube-system) does not have a corresponding mirror pod; skipping

I'm still hunting for the upstream commit that may have altered handling for static/mirror pods.

@rphillips
Copy link
Contributor

Hi Dalton! The following issue and PR might be relevant to this issue: kubernetes/kubernetes#69346 kubernetes/kubernetes#69566.

Are you using a kubelet < 1.11 ?

@dghubble
Copy link
Contributor Author

The Kubelet matches the control plane version in my clusters.

@dghubble
Copy link
Contributor Author

dghubble commented Oct 10, 2018

Iterating through the v1.12 pre-releases, it seems this started happening between v1.12.0-beta.2 and v1.12.0-rc.1 (comparison). v1.12.0-beta.2 doesn't bootstrap (due to various Kubernetes bugs), but it gets far enough to show the pod-checkpointer's checkpoint pod gets created (i.e. there are two pods). In v1.12.0-rc.1, the 2nd pod is not created and the Kubelet shows the error message I posted above, about "does not have a corresponding mirror pod".

rel:
https://github.com/poseidon/terraform-render-bootkube/branches
https://github.com/poseidon/typhoon/branches

That's as far as I've made it so far. It might be beneficial to post a PR to bootkube attempting to bump to v1.12.1 to confirm others can repro the original issue. And then I suspect something within those 88 commits upstream.

@dghubble
Copy link
Contributor Author

#1003 may be a better way to repro. No need to involve CoreDNS changes and re-vendoring (like #1002) when we want to discover the breakage.

@rphillips
Copy link
Contributor

Agreed... I am getting:

predicate.go:133] Predicate failed on Pod: pod-checkpointer-l9cgg-172.17.4.101_kube-system(231cd4bc9c8f63b3131c1ec25716fe91), for reason: Predicate MatchNodeSelector failed

which looks like this kubernetes/kubernetes#65153 upstream issue.

@rphillips
Copy link
Contributor

Removing the nodeselector statements from both the checkpointer and apiserver checkpoint files restores the pods correctly.

@dghubble
Copy link
Contributor Author

dghubble commented Oct 11, 2018

I see that as well if I delete the DaemonSet pod-checkpointer. The checkpointed pod can't schedule. Its a great tip, its easier to see what's going on doing this from a running cluster (rather than after power cycling). Comparing actual checkpointed pod manifests btw a v1.11.3 cluster and a v1.12.1 cluster, I see a difference.

# Kubernetes v1.11.3
$ cat kube-system-pod-checkpointer-2kflw.json | jq . | grep affinity
Nothing here

# Kubernetes v1.12.1
cat kube-system-pod-checkpointer-bxk2m.json | jq . | grep affinity
There is an affinity block
"affinity": {
      "nodeAffinity": {
        "requiredDuringSchedulingIgnoredDuringExecution": {
          "nodeSelectorTerms": [
            {
              "matchExpressions": null
            }
          ]
        }
      }
    },

Maybe related to kubernetes/kubernetes#68173 which was not in v1.12.0-beta.2 and first in v1.12.0-rc.1. Although I'm not sure how affinity applies during early bootstrapping.

@dghubble
Copy link
Contributor Author

dghubble commented Oct 11, 2018

I suppose it is unusual checkpoints have a node selector or affinity at all since they're pod on disk and should always run on that node. But looking at the checkpoint manifest in v1.11.3, those also had a nodeSelector and cluster power cycles work without issue. So I don't understand the report in kubernetes/kubernetes#65153.

I tried a similar experiment to yours, launching a v1.12.1 cluster, power cycling it, and but then modifying the pod-checkpointer and apiserver checkpoint files to remove the affinity section. The cluster recovered. And the affinity section wasn't in checkpoint files prior.

Of course, as soon as the cluster recovers, the pod-checkpointer overwrites the checkpoint file to include an affinity again. So only one of the two pods it running and I'd expect the same issue on the next power cycle.

Perhaps pod-checkpointer should strip the affinity from the manifest before writing to disk?

@rphillips
Copy link
Contributor

I am thinking the affinity should be set to nil if matchExpressions == nil.

@dghubble
Copy link
Contributor Author

Sound reasonable to me.

I wonder if pod-checkpointer even supports checkpointing pods that have an affinity at all (pod-checkpointer and apiserver don't have one). Maybe we should also document that to use pod-checkpointer, a pod manifest needs to have the checkpointer.alpha.coreos.com/checkpoint=true annotation and should not have any affinities in docs.

dghubble added a commit to poseidon/terraform-render-bootstrap that referenced this issue Oct 14, 2018
* Mount an empty dir for the controller-manager to work around
kubernetes/kubernetes#68973
* Use a patched pod-checkpointer that strips affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* kubernetes-retired/bootkube#1001
* kubernetes/kubernetes#68173
dghubble added a commit to poseidon/typhoon that referenced this issue Oct 14, 2018
* Mount an empty dir for the controller-manager to work around
kubernetes/kubernetes#68973
* Use a patched pod-checkpointer that strips affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* kubernetes-retired/bootkube#1001
* kubernetes/kubernetes#68173
dghubble added a commit to poseidon/typhoon that referenced this issue Oct 14, 2018
* Mount an empty dir for the controller-manager to work around
kubernetes/kubernetes#68973
* Use a patched pod-checkpointer that strips affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* kubernetes-retired/bootkube#1001
* kubernetes/kubernetes#68173
dghubble added a commit to poseidon/terraform-render-bootstrap that referenced this issue Oct 17, 2018
* Mount an empty dir for the controller-manager to work around
kubernetes/kubernetes#68973
* Update coreos/pod-checkpointer to strips affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* kubernetes-retired/bootkube#1001
* kubernetes/kubernetes#68173
dghubble added a commit to poseidon/terraform-render-bootstrap that referenced this issue Oct 17, 2018
* Mount an empty dir for the controller-manager to work around
kubernetes/kubernetes#68973
* Update coreos/pod-checkpointer to strip affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* kubernetes-retired/bootkube#1001
* kubernetes/kubernetes#68173
dghubble added a commit to poseidon/typhoon that referenced this issue Oct 17, 2018
* Mount an empty dir for the controller-manager to work around
kubernetes/kubernetes#68973
* Update coreos/pod-checkpointer to strip affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* kubernetes-retired/bootkube#1001
* kubernetes/kubernetes#68173
dghubble-robot pushed a commit to poseidon/terraform-onprem-kubernetes that referenced this issue Oct 17, 2018
* Mount an empty dir for the controller-manager to work around
kubernetes/kubernetes#68973
* Update coreos/pod-checkpointer to strip affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* kubernetes-retired/bootkube#1001
* kubernetes/kubernetes#68173
dghubble-robot pushed a commit to poseidon/terraform-digitalocean-kubernetes that referenced this issue Oct 17, 2018
* Mount an empty dir for the controller-manager to work around
kubernetes/kubernetes#68973
* Update coreos/pod-checkpointer to strip affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* kubernetes-retired/bootkube#1001
* kubernetes/kubernetes#68173
dghubble-robot pushed a commit to poseidon/terraform-aws-kubernetes that referenced this issue Oct 17, 2018
* Mount an empty dir for the controller-manager to work around
kubernetes/kubernetes#68973
* Update coreos/pod-checkpointer to strip affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* kubernetes-retired/bootkube#1001
* kubernetes/kubernetes#68173
dghubble-robot pushed a commit to poseidon/terraform-google-kubernetes that referenced this issue Oct 17, 2018
* Mount an empty dir for the controller-manager to work around
kubernetes/kubernetes#68973
* Update coreos/pod-checkpointer to strip affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* kubernetes-retired/bootkube#1001
* kubernetes/kubernetes#68173
@dghubble
Copy link
Contributor Author

dghubble commented Oct 17, 2018

The issue with the pod-checkpointer was closed by #1009. Thanks @rphillips! The new image is quay.io/coreos/pod-checkpointer:018007e77ccd61e8e59b7e15d7fc5e318a5a2682.

It can be used with v1.12 or prior versions too, not really tied to v1.12. I'm closing since actually upgrading to v1.12 is separate and is continuing in #1003

dghubble-robot pushed a commit to poseidon/terraform-azure-kubernetes that referenced this issue May 25, 2020
* Mount an empty dir for the controller-manager to work around
kubernetes/kubernetes#68973
* Update coreos/pod-checkpointer to strip affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* kubernetes-retired/bootkube#1001
* kubernetes/kubernetes#68173
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants