Job resources are not automatically cleaned up preventing version upgrades #7118

marshall007 · 2021-05-08T00:58:08Z

NGINX Ingress controller version: v0.45.0

Kubernetes version (use kubectl version): v1.19.0

Environment:

Cloud provider or hardware configuration: GKE / EKS
Install tools: kustomize and kpt live apply

What happened:

First deployment of ingress-nginx from static manifests is successful. When attempting to upgrade to any newer version of the static manifests (in our case v0.44.0 -> v0.45.0) you will see errors indicating that job specs are immutable:

The Job "ingress-nginx-admission-create" is invalid: spec.template: Invalid value: core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"ingress-nginx-admission-create", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"app.kubernetes.io/component":"admission-webhook", "app.kubernetes.io/instance":"ingress-nginx", "app.kubernetes.io/managed-by":"Helm", "app.kubernetes.io/name":"ingress-nginx", "app.kubernetes.io/version":"0.45.0", "controller-uid":"5d2d96c5-6c0c-434d-8bb6-c90124b5fbbf", "helm.sh/chart":"ingress-nginx-3.27.0", "job-name":"ingress-nginx-admission-create"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:core.PodSpec{Volumes:[]core.Volume(nil), InitContainers:[]core.Container(nil), Containers:[]core.Container{core.Container{Name:"create", Image:"docker.io/jettech/kube-webhook-certgen:v1.5.1", Command:[]string(nil), Args:[]string{"create", "--host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.$(POD_NAMESPACE).svc", "--namespace=$(POD_NAMESPACE)", "--secret-name=ingress-nginx-admission"}, WorkingDir:"", Ports:[]core.ContainerPort(nil), EnvFrom:[]core.EnvFromSource(nil), Env:[]core.EnvVar{core.EnvVar{Name:"POD_NAMESPACE", Value:"", ValueFrom:(*core.EnvVarSource)(0xc02df04f20)}}, Resources:core.ResourceRequirements{Limits:core.ResourceList(nil), Requests:core.ResourceList(nil)}, VolumeMounts:[]core.VolumeMount(nil), VolumeDevices:[]core.VolumeDevice(nil), LivenessProbe:(*core.Probe)(nil), ReadinessProbe:(*core.Probe)(nil), StartupProbe:(*core.Probe)(nil), Lifecycle:(*core.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"IfNotPresent", SecurityContext:(*core.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]core.EphemeralContainer(nil), RestartPolicy:"OnFailure", TerminationGracePeriodSeconds:(*int64)(0xc003fb51c0), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"ingress-nginx-admission", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", SecurityContext:(*core.PodSecurityContext)(0xc03cd46800), ImagePullSecrets:[]core.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*core.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]core.Toleration(nil), HostAliases:[]core.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), PreemptionPolicy:(*core.PreemptionPolicy)(nil), DNSConfig:(*core.PodDNSConfig)(nil), ReadinessGates:[]core.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), Overhead:core.ResourceList(nil), EnableServiceLinks:(*bool)(nil), TopologySpreadConstraints:[]core.TopologySpreadConstraint(nil)}}: field is immutable

I think this is simply because labels like app.kubernetes.io/version and helm.sh/chart are changing between versions.

We are using kpt pkg commands to pull the manifests maintained in deploy/static/provider/cloud into our own git repo, we apply kustomizations on top of that, and deploy with kpt live commands.

What you expected to happen:

I believe helm users do not experience this problem because of the helm.sh/hook-delete-policy annotations specified on the various Job resources. These is obviously not respected when deploying static manifests using other tooling.

I would expect users deploying from static manifests to have a similar experience when upgrading between versions.

Rather than relying on proprietary helm hooks to ensure cleanup of the Job resources, perhaps we can use spec.ttlSecondsAfterFinished in order to ensure these resources are pruned in a portable fashion.

How to reproduce it:

This is not precisely what we are doing, but it should be reproducible like so:

# fetch the package
kpt pkg get github.com/kubernetes/ingress-nginx/deploy/static/provider/cloud?ref=controller-v0.44.0 ingress-nginx

# setup
pushd ingress-nginx
kpt live init .
kustomize edit add resources inventory-template.yaml
popd

# deploy v0.44.0
kustomize build ingress-nginx | kpt live apply --reconcile-timeout=15m

# upgrade package to v0.45.0
kpt pkg update ingress-nginx@controller-v0.45.0 --strategy force-delete-replace

# attempt to deploy v0.45.0
kustomize build ingress-nginx | kpt live apply

/kind bug

The text was updated successfully, but these errors were encountered:

marshall007 · 2021-05-12T14:50:59Z

I see that my suggestion of using spec.ttlSecondsAfterFinished is already implemented in the helm chart:

ingress-nginx/charts/ingress-nginx/templates/admission-webhooks/job-patch/job-patchWebhook.yaml

Lines 13 to 16 in 3dd7c83

    
           {{- if .Capabilities.APIVersions.Has "batch/v1alpha1" }} 
        
             # Alpha feature since k8s 1.12 
        
             ttlSecondsAfterFinished: 0 
        
           {{- end }}

However, this does not end up in the generated static manifests:

ingress-nginx/deploy/static/provider/cloud/deploy.yaml

Lines 623 to 626 in 3dd7c83

    
           spec: 
        
             template: 
        
               metadata: 
        
                 name: ingress-nginx-admission-patch

In order for this to work we'd need to start setting --api-versions in the calls to helm template in the gen script to something reasonable:

ingress-nginx/hack/generate-deploy-scripts.sh

Line 56 in 3dd7c83

    
           cat << EOF | helm template $RELEASE_NAME ${DIR}/charts/ingress-nginx --namespace $NAMESPACE --values - | $DIR/hack/add-namespace.py $NAMESPACE > ${OUTPUT_FILE}

k8s-triage-robot · 2021-08-10T16:49:05Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-09-09T16:51:21Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

NeckBeardPrince · 2021-09-09T18:16:47Z

Not stale

/remove-lifecycle rotten

NeckBeardPrince · 2021-10-25T19:02:30Z

Can we get some eyes on this?

iamNoah1 · 2021-11-17T18:13:36Z

/lifecycle active
/area helm
/priority important-longterm

@NeckBeardPrince see this comment from @rikatz: #7128 (comment)

See kubernetes/ingress-nginx#7118

blu3r4y · 2022-01-13T09:01:40Z

As a temporary workaround, I added the following steps after I deploy the controller:

kubectl patch -n ingress-nginx job ingress-nginx-admission-create -p '{"spec":{"ttlSecondsAfterFinished":0}}'
kubectl patch -n ingress-nginx job ingress-nginx-admission-patch -p '{"spec":{"ttlSecondsAfterFinished":0}}'

If you are using PowerShell on Windows, don't forget to escape the quotes differently:

kubectl patch -n ingress-nginx job ingress-nginx-admission-create -p '{\"spec\":{\"ttlSecondsAfterFinished\":0}}'
kubectl patch -n ingress-nginx job ingress-nginx-admission-patch -p '{\"spec\":{\"ttlSecondsAfterFinished\":0}}'

iamNoah1 · 2022-04-12T08:51:08Z

/triage accepted
/help

k8s-ci-robot · 2022-04-12T08:51:09Z

@iamNoah1:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/triage accepted
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

numbsafari · 2022-04-29T22:50:11Z

In our config we are kustomize. For this component we are applying a patch, e.g.:

patches:
- target:
    kind: Job
  patch: |-
    - op: add
      path: /spec/ttlSecondsAfterFinished
      value: 0

robert-heinzmann-logmein · 2022-07-21T20:59:01Z

Looks like newer versions of k8s do not ship the batch/v1alpha1 version anymore and therefore the capacilities flag does not work anymore. The following patch should do the trick:

@@ -17,7 +17,7 @@
     {{- toYaml . | nindent 4 }}
     {{- end }}
 spec:
-{{- if .Capabilities.APIVersions.Has "batch/v1alpha1" }}
+{{- if or (.Capabilities.APIVersions.Has "batch/v1alpha1") (.Capabilities.APIVersions.Has "batch/v1beta1") (.Capabilities.APIVersions.Has "batch/v1") }}
   # Alpha feature since k8s 1.12, beta since 1.21, GA in 1.23
   ttlSecondsAfterFinished: 0
 {{- end }}

rikatz · 2022-07-21T21:02:24Z

@robert-heinzmann-logmein wanna send the PR? Then just tag me and I can approve it.

Also we dont support k8s <v1.20 anymore so fully dropping this API should be good

robert-heinzmann-logmein · 2022-07-21T21:18:59Z

Not sure, how about #7128 ? Following the discussions, I think as long as you support 1.20 (where this is still alpha) the toggle can not be removed right ?

rikatz · 2022-07-21T22:07:07Z

Oh it is beta only on v1.21? :/

yeah we should wait until v1.25 and drop support to v1.20

rikatz · 2022-07-21T22:08:00Z

Also apparently that PR is stale, so if you are willing to open a new one lmk.

robert-heinzmann-logmein · 2022-09-05T06:54:24Z

It seems 1.25 is out, so this looks unblocked.

k8s-triage-robot · 2024-01-18T23:58:50Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

From kubernetes/ingress-nginx#7118

longwuyuan · 2024-06-29T08:11:06Z

Enough number of upgrades are safely assumed to have been performed since the last activity on this issue and the version of the controller + the version of the Kubernetes referred to here, is no longer supported or too old to support.

Since nobody is reporting upgrade failures due to pending jobs, I will close the issue for now. If a user posts data proving that upgrades fail due to pending jobs, on a kind cluster or a minikube cluster, then we can reopen this issue.

/close

k8s-ci-robot · 2024-06-29T08:11:10Z

@longwuyuan: Closing this issue.

In response to this:

Enough number of upgrades are safely assumed to have been performed since the last activity on this issue and the version of the controller + the version of the Kubernetes referred to here, is no longer supported or too old to support.

Since nobody is reporting upgrade failures due to pending jobs, I will close the issue for now. If a user posts data proving that upgrades fail due to pending jobs, on a kind cluster or a minikube cluster, then we can reopen this issue.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

marshall007 added the kind/bug Categorizes issue or PR as related to a bug. label May 8, 2021

marshall007 mentioned this issue May 12, 2021

Ensure ttlSecondsAfterFinished is set on Job resources in static deploy manifests #7128

Closed

8 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2021

This was referenced Aug 16, 2021

[1.3] Updated nginx ingress addon to remove admission batch jobs if they exist rancher/rke#2655

Merged

rke-ingress-controller-deploy-job fails on rke cluster update rancher/rancher#34115

Closed

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 9, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 9, 2021

zevisert pushed a commit to zevisert/doku-mgmt that referenced this issue Dec 24, 2021

fix: automatically prune nginx admission create/patch jobs

54d914a

See kubernetes/ingress-nginx#7118

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Apr 12, 2022

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 18, 2024

niflostancu added a commit to niflostancu/k8s-service-templates that referenced this issue Mar 6, 2024

ingress-nginx: add job cleanup patch

a01b02f

From kubernetes/ingress-nginx#7118

k8s-ci-robot closed this as completed Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job resources are not automatically cleaned up preventing version upgrades #7118

Job resources are not automatically cleaned up preventing version upgrades #7118

marshall007 commented May 8, 2021 •

edited

Loading

marshall007 commented May 12, 2021

k8s-triage-robot commented Aug 10, 2021

k8s-triage-robot commented Sep 9, 2021

NeckBeardPrince commented Sep 9, 2021

NeckBeardPrince commented Oct 25, 2021

iamNoah1 commented Nov 17, 2021

blu3r4y commented Jan 13, 2022

iamNoah1 commented Apr 12, 2022

k8s-ci-robot commented Apr 12, 2022

numbsafari commented Apr 29, 2022

robert-heinzmann-logmein commented Jul 21, 2022 •

edited

Loading

rikatz commented Jul 21, 2022

robert-heinzmann-logmein commented Jul 21, 2022

rikatz commented Jul 21, 2022

rikatz commented Jul 21, 2022

robert-heinzmann-logmein commented Sep 5, 2022

k8s-triage-robot commented Jan 18, 2024

longwuyuan commented Jun 29, 2024

k8s-ci-robot commented Jun 29, 2024

Job resources are not automatically cleaned up preventing version upgrades #7118

Job resources are not automatically cleaned up preventing version upgrades #7118

Comments

marshall007 commented May 8, 2021 • edited Loading

marshall007 commented May 12, 2021

k8s-triage-robot commented Aug 10, 2021

k8s-triage-robot commented Sep 9, 2021

NeckBeardPrince commented Sep 9, 2021

NeckBeardPrince commented Oct 25, 2021

iamNoah1 commented Nov 17, 2021

blu3r4y commented Jan 13, 2022

iamNoah1 commented Apr 12, 2022

k8s-ci-robot commented Apr 12, 2022

Guidelines

numbsafari commented Apr 29, 2022

robert-heinzmann-logmein commented Jul 21, 2022 • edited Loading

rikatz commented Jul 21, 2022

robert-heinzmann-logmein commented Jul 21, 2022

rikatz commented Jul 21, 2022

rikatz commented Jul 21, 2022

robert-heinzmann-logmein commented Sep 5, 2022

k8s-triage-robot commented Jan 18, 2024

longwuyuan commented Jun 29, 2024

k8s-ci-robot commented Jun 29, 2024

marshall007 commented May 8, 2021 •

edited

Loading

robert-heinzmann-logmein commented Jul 21, 2022 •

edited

Loading