Skip to content
This repository has been archived by the owner on Jul 30, 2021. It is now read-only.

checkpointer: ignore Affinity within podspec #1004

Conversation

rphillips
Copy link
Contributor

Kubernetes 1.12.x introduced new logic for Affinity [1]. In addition to
new logic, the Pod contains a default affinity. The new default affinity
gets serialized into the checkpoint file, and the 1.12.x kubelet does
not restore the pod due to the affinity.

This PR removes the affinity from the spec and documents that affinity's
are not supported.

"affinity": {
      "nodeAffinity": {
        "requiredDuringSchedulingIgnoredDuringExecution": {
          "nodeSelectorTerms": [
            {
              "matchExpressions": null
            }
          ]
        }
      }
    },

/cc @aaronlevy @dghubble
/fixes #1001

[1] kubernetes/kubernetes#68173
[2] https://github.com/kubernetes/kubernetes/blob/e39b510726113581c6f6a9c2db1753d794aa9cce/pkg/controller/daemon/util/daemonset_util.go#L183-L196

Kubernetes 1.12.x introduced new logic for Affinity [1]. In addition to
new logic, the Pod contains a default affinity. The new default affinity
gets serialized into the checkpoint file, and the 1.12.x kubelet does
not restore the pod due to the affinity.

This PR removes the affinity from the spec and documents that affinity's
are not supported.

```
"affinity": {
      "nodeAffinity": {
        "requiredDuringSchedulingIgnoredDuringExecution": {
          "nodeSelectorTerms": [
            {
              "matchExpressions": null
            }
          ]
        }
      }
    },

```

[1] kubernetes/kubernetes#68173
[2] https://github.com/kubernetes/kubernetes/blob/e39b510726113581c6f6a9c2db1753d794aa9cce/pkg/controller/daemon/util/daemonset_util.go#L183-L196
@k8s-ci-robot
Copy link
Contributor

@rphillips: GitHub didn't allow me to request PR reviews from the following users: dghubble.

Note that only kubernetes-incubator members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Kubernetes 1.12.x introduced new logic for Affinity [1]. In addition to
new logic, the Pod contains a default affinity. The new default affinity
gets serialized into the checkpoint file, and the 1.12.x kubelet does
not restore the pod due to the affinity.

This PR removes the affinity from the spec and documents that affinity's
are not supported.

"affinity": {
     "nodeAffinity": {
       "requiredDuringSchedulingIgnoredDuringExecution": {
         "nodeSelectorTerms": [
           {
             "matchExpressions": null
           }
         ]
       }
     }
   },

/cc @aaronlevy @dghubble
/fixes #1001

[1] kubernetes/kubernetes#68173
[2] https://github.com/kubernetes/kubernetes/blob/e39b510726113581c6f6a9c2db1753d794aa9cce/pkg/controller/daemon/util/daemonset_util.go#L183-L196

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 11, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: rphillips

If they are not already assigned, you can assign the PR to them by writing /assign @rphillips in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 11, 2018
@aaronlevy
Copy link
Contributor

This looks fine to me - but don't want to lgtm until we have tests hooked up (Also would be good to get prow to respect the tests passing or not so an lgtm doesn't auto merge something that didn't pass - see: #1000 (comment))

@dghubble
Copy link
Contributor

I tested this out by building a pod-checkpointer image and using it as the checkpointer in a v1.12.1 cluster. It solves this issue for me. In steady-state, there are two pod-checkpoint pods running like before (one from DaemonSet, one from checkpointed pod) and power cycling the cluster is tolerated.

Thanks! 🙌

@rphillips
Copy link
Contributor Author

I'm working on the builders today to get PR testing working again.

@rphillips
Copy link
Contributor Author

coreosbot run e2e

@rphillips
Copy link
Contributor Author

coreosbot run e2e calico

1 similar comment
@rphillips
Copy link
Contributor Author

coreosbot run e2e calico

@rphillips
Copy link
Contributor Author

closing in favor of a PR with tests. #1007

@rphillips rphillips closed this Oct 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubernetes v1.12.x doesn't restore pod-checkpointer
4 participants