-
Notifications
You must be signed in to change notification settings - Fork 487
Ensure OLM finalizer runs to prevent px-operator namespace from being stuck terminating #2059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
aimichelle
merged 3 commits into
pixie-io:main
from
ddelnano:ddelnano/fix-helm-uninstall-olm-finalizer
Dec 18, 2024
Merged
Ensure OLM finalizer runs to prevent px-operator namespace from being stuck terminating #2059
aimichelle
merged 3 commits into
pixie-io:main
from
ddelnano:ddelnano/fix-helm-uninstall-olm-finalizer
Dec 18, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Member
Author
|
@pixie-io/maintainers this is ready for review when you have the chance. |
aimichelle
reviewed
Dec 16, 2024
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Member
Author
|
@aimichelle once this is ready from your perspective, can you trigger an operator build from this branch again? I want to make sure I test the final change before it's merged to main. |
Member
Author
|
@aimichelle the 0.1.7-pre-z0.0 operator build works, so this is ready to be merged. |
aimichelle
approved these changes
Dec 18, 2024
3 tasks
ddelnano
added a commit
that referenced
this pull request
Dec 18, 2024
…ls (#2063) Summary: Prevent csv-finalizer Job from being included in operator release yamls #2059 introduced a new Job that fixed helm's uninstall issues caused by OLM's recent csv-finalizer addition. This properly addressed the helm issues in #1917, however, it broke the `px` cli install process since the Job wasn't excluded from the operator release yamls. This results in `px-operator` namespace termination as the cli is trying to deploy the vizier since the Job runs unconditionally. This change also renames the `deleter_role.yaml` file since it seems to be accidentally included in the operator release yamls. Please see testing done for how this was determined to be extraneous. Relevant Issues: #1917 Type of change: /kind bug Test Plan: Verified the following - [x] `helm template` includes the `csv-finalizer` job ``` # Create dummy Chart.yaml to appease helm $ helm template --set deployOLM=true k8s/operator/helm/ | grep 'csv-deleter' # Source: pixie/templates/csv-deleter.yaml name: csv-deleter ``` - [x] `bazel build k8s/operator:operator_templates` no longer includes the `csv-finalizer` job or the `deleter_role.yaml` ``` $ tar -tf bazel-bin/k8s/operator/operator_templates.tar yamls/ yamls/crds/ yamls/crds/olm_crd.yaml yamls/crds/vizier_crd.yaml yamls/templates/ yamls/templates/00_olm.yaml yamls/templates/01_px_olm.yaml yamls/templates/02_catalog.yaml yamls/templates/03_subscription.yaml yamls/templates/04_vizier.yaml ``` - [x] Verified deleter role is excluded from `px deploy`'s extracted yaml. [This](https://github.com/pixie-io/pixie/blob/9effb349be7a42f8b45ca8fce6cbfdac619349ac/src/utils/shared/artifacts/yamls.go#L165-L170) code excludes anything that isn't a "crd" file or is isn't numerically prefixed, which means the deleter role isn't included for `px` cli deploys ``` $ px deploy --operator_version=0.1.7-pre-z1.0 -e . --deploy_key=<deploy_key> $ tree pixie_yamls/ pixie_yamls/ ├── 00_olm_crd.yaml ├── 01_vizier_crd.yaml ├── 02_olm.yaml ├── 03_px_olm.yaml ├── 04_catalog.yaml ├── 05_subscription.yaml └── 06_vizier.yaml 1 directory, 7 files ``` --------- Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
ddelnano
added a commit
that referenced
this pull request
Dec 19, 2024
Summary: Ensure helm release build consistently handles deployOLM toggle Splitting the csv-deleter Job into its own file in #2063 introduced a bug where the `px-operator` namespace becomes stuck on `helm uninstall` again. This is because helm's `deployOLM` configuration key is patched for the [00_olm.yaml file](https://github.com/pixie-io/pixie/blob/5c5e9dcb259c6548c9ab2c4b31c80d31f5f90124/ci/operator_helm_build_release.sh#L60) during the release build. When I initially tested in #2059, the csv-deleter Job existed in the file and received this special treatment, but after #2063 it no longer did. In order to address this, this change introduces a Go template comment placeholder that `sed` will look to make it clear that helm operates with a different conditional check. It also runs this sed command across all template files and not just the `00_olm.yaml` file. This comment will be stripped out during the `px` cli's yaml templating and will leave it's `deployOLM` conditional check intact. Relevant Issues: #1917 Type of change: /kind bug Test Plan: Ran the sed command and verified that it works ``` # Verify bash variables are set so sed command can be copy and pasted from this PR $ echo $repo_path ./ $ echo $helm_tmpl_checks {{- $olmCRDFound := false }} \ {{- $nsLookup := len (lookup "v1" "Namespace" "" "") }} \ {{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}} \ {{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}} \ {{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }}\n{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }} # Run copy and pasted sed command $ find "${repo_path}/k8s/operator/helm/templates" -type f -exec sed -i "/HELM_DEPLOY_OLM_PLACEHOLDER/c\\${helm_tmpl_checks}" {} \; $ git diff diff --git a/k8s/operator/helm/templates/00_olm.yaml b/k8s/operator/helm/templates/00_olm.yaml index 381c856..cd4ccafb2 100644 --- a/k8s/operator/helm/templates/00_olm.yaml +++ b/k8s/operator/helm/templates/00_olm.yaml @@ -1,4 +1,9 @@ -{{if .Values.deployOLM}}{{- /* HELM_DEPLOY_OLM_PLACEHOLDER */ -}} +{{- $olmCRDFound := false }} +{{- $nsLookup := len (lookup "v1" "Namespace" "" "") }} +{{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}} +{{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}} +{{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }} +{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }} {{ if not (eq .Values.olmNamespace .Release.Namespace) }} --- apiVersion: v1 diff --git a/k8s/operator/helm/templates/csv-deleter.yaml b/k8s/operator/helm/templates/csv-deleter.yaml index f785a63..5c0efb20e 100644 --- a/k8s/operator/helm/templates/csv-deleter.yaml +++ b/k8s/operator/helm/templates/csv-deleter.yaml @@ -1,4 +1,9 @@ -{{if .Values.deployOLM}}{{- /* HELM_DEPLOY_OLM_PLACEHOLDER */ -}} +{{- $olmCRDFound := false }} +{{- $nsLookup := len (lookup "v1" "Namespace" "" "") }} +{{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}} +{{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}} +{{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }} +{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }} --- apiVersion: batch/v1 kind: Job ``` Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
ddelnano
added a commit
to ddelnano/pixie
that referenced
this pull request
Aug 6, 2025
… stuck terminating (pixie-io#2059) Summary: Ensure OLM finalizer runs to prevent px-operator namespace from being stuck terminating The helm install process followed by a helm uninstall does not fully clean up all pixie resources in the v0.1.7 operator release. The OLM project [added](operator-framework/operator-lifecycle-manager@f94a5ed) a csv-cleanup finalizer in [v0.27.0](https://github.com/operator-framework/operator-lifecycle-manager/releases/tag/v0.27.0) that causes the px-operator to get stuck in a terminating state if the `olm` and `px-operator` namespaces are deleted at the same time. In order to address this, a new Job is introduced within the olm namespace that triggers the deletion of the olm operator namespace (px-operator) from a `pre-delete` hook. This bug is not present when OLM is installed outside of the helm since the finalizer has time to run. Therefore this job only needs to run if `deployOLM` is set (helm is managing OLM). The other alternative I considered was writing another one off utility similar to the `vizier_deleter` Job. This would have the benefit of having a small surface area and wouldn't rely on third party images. Let me know if you have opinions/thoughts on that option or any other alternatives. Relevant Issues: pixie-io#1917 Type of change: /kind bug Test Plan: Verified that the operator dev helm chart from this branch uninstalls properly ``` $ helm install pixie pixie-dev-operator/pixie-operator-chart --version 0.1.7-pre-ddelnano-fix-helm-uninstall-olm-finalizer.0 --set cloudAddr=<cloud_addr> --set deployKey=<deploy_key> --set clusterName='helm-uninstall-test' --namespace pl --create-namespace NAME: pixie LAST DEPLOYED: Wed Dec 11 03:13:42 2024 NAMESPACE: pl STATUS: deployed REVISION: 1 TEST SUITE: None $ helm -n pl uninstall pixie release "pixie" uninstalled $ kubectl get namespaces | grep 'px-operator\|olm\|pl' pl Active 6m31s $ kubectl -n pl get all No resources found in pl namespace. ``` - [x] Verified deployOLM controls if Job is present with `helm template` ``` $ helm template --set deployOLM=true k8s/operator/helm/ | grep -A 5 'Job' kind: Job metadata: name: csv-deleter namespace: olm annotations: "helm.sh/hook": pre-delete -- kind: Job metadata: name: vizier-deleter annotations: "helm.sh/hook": pre-delete "helm.sh/hook-delete-policy": hook-succeeded $ helm template --set deployOLM=false k8s/operator/helm/ | grep -A 5 'Job' kind: Job metadata: name: vizier-deleter annotations: "helm.sh/hook": pre-delete "helm.sh/hook-delete-policy": hook-succeeded ``` Changelog Message: Fix bug with the v0.1.7 operator helm chart that would cause a stuck `px-operator` namespace on uninstall --------- Signed-off-by: Dom Del Nano <ddelnano@gmail.com> GitOrigin-RevId: 9effb34
ddelnano
added a commit
to ddelnano/pixie
that referenced
this pull request
Aug 6, 2025
…ls (pixie-io#2063) Summary: Prevent csv-finalizer Job from being included in operator release yamls pixie-io#2059 introduced a new Job that fixed helm's uninstall issues caused by OLM's recent csv-finalizer addition. This properly addressed the helm issues in pixie-io#1917, however, it broke the `px` cli install process since the Job wasn't excluded from the operator release yamls. This results in `px-operator` namespace termination as the cli is trying to deploy the vizier since the Job runs unconditionally. This change also renames the `deleter_role.yaml` file since it seems to be accidentally included in the operator release yamls. Please see testing done for how this was determined to be extraneous. Relevant Issues: pixie-io#1917 Type of change: /kind bug Test Plan: Verified the following - [x] `helm template` includes the `csv-finalizer` job ``` # Create dummy Chart.yaml to appease helm $ helm template --set deployOLM=true k8s/operator/helm/ | grep 'csv-deleter' # Source: pixie/templates/csv-deleter.yaml name: csv-deleter ``` - [x] `bazel build k8s/operator:operator_templates` no longer includes the `csv-finalizer` job or the `deleter_role.yaml` ``` $ tar -tf bazel-bin/k8s/operator/operator_templates.tar yamls/ yamls/crds/ yamls/crds/olm_crd.yaml yamls/crds/vizier_crd.yaml yamls/templates/ yamls/templates/00_olm.yaml yamls/templates/01_px_olm.yaml yamls/templates/02_catalog.yaml yamls/templates/03_subscription.yaml yamls/templates/04_vizier.yaml ``` - [x] Verified deleter role is excluded from `px deploy`'s extracted yaml. [This](https://github.com/pixie-io/pixie/blob/9effb349be7a42f8b45ca8fce6cbfdac619349ac/src/utils/shared/artifacts/yamls.go#L165-L170) code excludes anything that isn't a "crd" file or is isn't numerically prefixed, which means the deleter role isn't included for `px` cli deploys ``` $ px deploy --operator_version=0.1.7-pre-z1.0 -e . --deploy_key=<deploy_key> $ tree pixie_yamls/ pixie_yamls/ ├── 00_olm_crd.yaml ├── 01_vizier_crd.yaml ├── 02_olm.yaml ├── 03_px_olm.yaml ├── 04_catalog.yaml ├── 05_subscription.yaml └── 06_vizier.yaml 1 directory, 7 files ``` --------- Signed-off-by: Dom Del Nano <ddelnano@gmail.com> GitOrigin-RevId: 5c5e9dc
ddelnano
added a commit
to ddelnano/pixie
that referenced
this pull request
Aug 6, 2025
…e-io#2066) Summary: Ensure helm release build consistently handles deployOLM toggle Splitting the csv-deleter Job into its own file in pixie-io#2063 introduced a bug where the `px-operator` namespace becomes stuck on `helm uninstall` again. This is because helm's `deployOLM` configuration key is patched for the [00_olm.yaml file](https://github.com/pixie-io/pixie/blob/5c5e9dcb259c6548c9ab2c4b31c80d31f5f90124/ci/operator_helm_build_release.sh#L60) during the release build. When I initially tested in pixie-io#2059, the csv-deleter Job existed in the file and received this special treatment, but after pixie-io#2063 it no longer did. In order to address this, this change introduces a Go template comment placeholder that `sed` will look to make it clear that helm operates with a different conditional check. It also runs this sed command across all template files and not just the `00_olm.yaml` file. This comment will be stripped out during the `px` cli's yaml templating and will leave it's `deployOLM` conditional check intact. Relevant Issues: pixie-io#1917 Type of change: /kind bug Test Plan: Ran the sed command and verified that it works ``` # Verify bash variables are set so sed command can be copy and pasted from this PR $ echo $repo_path ./ $ echo $helm_tmpl_checks {{- $olmCRDFound := false }} \ {{- $nsLookup := len (lookup "v1" "Namespace" "" "") }} \ {{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}} \ {{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}} \ {{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }}\n{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }} # Run copy and pasted sed command $ find "${repo_path}/k8s/operator/helm/templates" -type f -exec sed -i "/HELM_DEPLOY_OLM_PLACEHOLDER/c\\${helm_tmpl_checks}" {} \; $ git diff diff --git a/k8s/operator/helm/templates/00_olm.yaml b/k8s/operator/helm/templates/00_olm.yaml index 381c856..cd4ccafb2 100644 --- a/k8s/operator/helm/templates/00_olm.yaml +++ b/k8s/operator/helm/templates/00_olm.yaml @@ -1,4 +1,9 @@ -{{if .Values.deployOLM}}{{- /* HELM_DEPLOY_OLM_PLACEHOLDER */ -}} +{{- $olmCRDFound := false }} +{{- $nsLookup := len (lookup "v1" "Namespace" "" "") }} +{{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}} +{{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}} +{{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }} +{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }} {{ if not (eq .Values.olmNamespace .Release.Namespace) }} --- apiVersion: v1 diff --git a/k8s/operator/helm/templates/csv-deleter.yaml b/k8s/operator/helm/templates/csv-deleter.yaml index f785a63..5c0efb20e 100644 --- a/k8s/operator/helm/templates/csv-deleter.yaml +++ b/k8s/operator/helm/templates/csv-deleter.yaml @@ -1,4 +1,9 @@ -{{if .Values.deployOLM}}{{- /* HELM_DEPLOY_OLM_PLACEHOLDER */ -}} +{{- $olmCRDFound := false }} +{{- $nsLookup := len (lookup "v1" "Namespace" "" "") }} +{{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}} +{{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}} +{{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }} +{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }} --- apiVersion: batch/v1 kind: Job ``` Signed-off-by: Dom Del Nano <ddelnano@gmail.com> GitOrigin-RevId: ae80b43
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary: Ensure OLM finalizer runs to prevent px-operator namespace from being stuck terminating
The helm install process followed by a helm uninstall does not fully clean up all pixie resources in the v0.1.7 operator release. The OLM project added a csv-cleanup finalizer in v0.27.0 that causes the px-operator to get stuck in a terminating state if the
olmandpx-operatornamespaces are deleted at the same time.In order to address this, a new Job is introduced within the olm namespace that triggers the deletion of the olm operator namespace (px-operator) from a
pre-deletehook. This bug is not present when OLM is installed outside of the helm since the finalizer has time to run. Therefore this job only needs to run ifdeployOLMis set (helm is managing OLM).The other alternative I considered was writing another one off utility similar to the
vizier_deleterJob. This would have the benefit of having a small surface area and wouldn't rely on third party images. Let me know if you have opinions/thoughts on that option or any other alternatives.Relevant Issues: #1917
Type of change: /kind bug
Test Plan: Verified that the operator dev helm chart from this branch uninstalls properly
helm templateChangelog Message: Fix bug with the v0.1.7 operator helm chart that would cause a stuck
px-operatornamespace on uninstall