Skip to content

Conversation

@ddelnano
Copy link
Member

@ddelnano ddelnano commented Dec 11, 2024

Summary: Ensure OLM finalizer runs to prevent px-operator namespace from being stuck terminating

The helm install process followed by a helm uninstall does not fully clean up all pixie resources in the v0.1.7 operator release. The OLM project added a csv-cleanup finalizer in v0.27.0 that causes the px-operator to get stuck in a terminating state if the olm and px-operator namespaces are deleted at the same time.

In order to address this, a new Job is introduced within the olm namespace that triggers the deletion of the olm operator namespace (px-operator) from a pre-delete hook. This bug is not present when OLM is installed outside of the helm since the finalizer has time to run. Therefore this job only needs to run if deployOLM is set (helm is managing OLM).

The other alternative I considered was writing another one off utility similar to the vizier_deleter Job. This would have the benefit of having a small surface area and wouldn't rely on third party images. Let me know if you have opinions/thoughts on that option or any other alternatives.

Relevant Issues: #1917

Type of change: /kind bug

Test Plan: Verified that the operator dev helm chart from this branch uninstalls properly

$ helm install pixie  pixie-dev-operator/pixie-operator-chart  --version 0.1.7-pre-ddelnano-fix-helm-uninstall-olm-finalizer.0 --set cloudAddr=<cloud_addr> --set deployKey=<deploy_key> --set clusterName='helm-uninstall-test' --namespace pl --create-namespace
NAME: pixie
LAST DEPLOYED: Wed Dec 11 03:13:42 2024
NAMESPACE: pl
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ helm -n pl uninstall pixie
release "pixie" uninstalled

$ kubectl get namespaces | grep 'px-operator\|olm\|pl'
pl                   Active   6m31s
$ kubectl -n pl get all
No resources found in pl namespace.
  • Verified deployOLM controls if Job is present with helm template
$ helm template --set deployOLM=true k8s/operator/helm/ | grep -A 5 'Job'
kind: Job
metadata:
  name: csv-deleter
  namespace: olm
  annotations:
    "helm.sh/hook": pre-delete
--
kind: Job
metadata:
  name: vizier-deleter
  annotations:
    "helm.sh/hook": pre-delete
    "helm.sh/hook-delete-policy": hook-succeeded
$ helm template --set deployOLM=false k8s/operator/helm/ | grep -A 5 'Job'
kind: Job
metadata:
  name: vizier-deleter
  annotations:
    "helm.sh/hook": pre-delete
    "helm.sh/hook-delete-policy": hook-succeeded

Changelog Message: Fix bug with the v0.1.7 operator helm chart that would cause a stuck px-operator namespace on uninstall

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
@ddelnano ddelnano requested a review from a team as a code owner December 11, 2024 03:48
@ddelnano
Copy link
Member Author

@pixie-io/maintainers this is ready for review when you have the chance.

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
@ddelnano
Copy link
Member Author

@aimichelle once this is ready from your perspective, can you trigger an operator build from this branch again? I want to make sure I test the final change before it's merged to main.

@ddelnano
Copy link
Member Author

ddelnano commented Dec 17, 2024

@aimichelle the 0.1.7-pre-z0.0 operator build works, so this is ready to be merged.

@aimichelle aimichelle merged commit 9effb34 into pixie-io:main Dec 18, 2024
33 of 35 checks passed
ddelnano added a commit that referenced this pull request Dec 18, 2024
…ls (#2063)

Summary: Prevent csv-finalizer Job from being included in operator
release yamls

#2059 introduced a new Job that fixed helm's uninstall issues caused by
OLM's recent csv-finalizer addition. This properly addressed the helm
issues in #1917, however, it
broke the `px` cli install process since the Job wasn't excluded from
the operator release yamls. This results in `px-operator` namespace
termination as the cli is trying to deploy the vizier since the Job runs
unconditionally.

This change also renames the `deleter_role.yaml` file since it seems to
be accidentally included in the operator release yamls. Please see
testing done for how this was determined to be extraneous.

Relevant Issues: #1917

Type of change: /kind bug

Test Plan: Verified the following
- [x] `helm template` includes the `csv-finalizer` job
```
# Create dummy Chart.yaml to appease helm
$ helm template --set deployOLM=true  k8s/operator/helm/ | grep 'csv-deleter'
# Source: pixie/templates/csv-deleter.yaml
  name: csv-deleter
```
- [x] `bazel build k8s/operator:operator_templates` no longer includes
the `csv-finalizer` job or the `deleter_role.yaml`
```
$ tar -tf bazel-bin/k8s/operator/operator_templates.tar
yamls/
yamls/crds/
yamls/crds/olm_crd.yaml
yamls/crds/vizier_crd.yaml
yamls/templates/
yamls/templates/00_olm.yaml
yamls/templates/01_px_olm.yaml
yamls/templates/02_catalog.yaml
yamls/templates/03_subscription.yaml
yamls/templates/04_vizier.yaml
```
- [x] Verified deleter role is excluded from `px deploy`'s extracted
yaml.
[This](https://github.com/pixie-io/pixie/blob/9effb349be7a42f8b45ca8fce6cbfdac619349ac/src/utils/shared/artifacts/yamls.go#L165-L170)
code excludes anything that isn't a "crd" file or is isn't numerically
prefixed, which means the deleter role isn't included for `px` cli
deploys
```
$ px deploy --operator_version=0.1.7-pre-z1.0 -e . --deploy_key=<deploy_key>
$ tree pixie_yamls/
pixie_yamls/
├── 00_olm_crd.yaml
├── 01_vizier_crd.yaml
├── 02_olm.yaml
├── 03_px_olm.yaml
├── 04_catalog.yaml
├── 05_subscription.yaml
└── 06_vizier.yaml

1 directory, 7 files
```

---------

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
ddelnano added a commit that referenced this pull request Dec 19, 2024
Summary: Ensure helm release build consistently handles deployOLM toggle

Splitting the csv-deleter Job into its own file in #2063 introduced a
bug where the `px-operator` namespace becomes stuck on `helm uninstall`
again. This is because helm's `deployOLM` configuration key is patched
for the [00_olm.yaml
file](https://github.com/pixie-io/pixie/blob/5c5e9dcb259c6548c9ab2c4b31c80d31f5f90124/ci/operator_helm_build_release.sh#L60)
during the release build. When I initially tested in
#2059, the csv-deleter Job existed
in the file and received this special treatment, but after #2063 it no
longer did.

In order to address this, this change introduces a Go template comment
placeholder that `sed` will look to make it clear that helm operates
with a different conditional check. It also runs this sed command across
all template files and not just the `00_olm.yaml` file. This comment
will be stripped out during the `px` cli's yaml templating and will
leave it's `deployOLM` conditional check intact.

Relevant Issues: #1917

Type of change: /kind bug

Test Plan: Ran the sed command and verified that it works
```
# Verify bash variables are set so sed command can be copy and pasted from this PR
$ echo $repo_path
./
$ echo $helm_tmpl_checks
{{- $olmCRDFound := false }} \ {{- $nsLookup := len (lookup "v1" "Namespace" "" "") }} \ {{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}} \ {{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}} \ {{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }}\n{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }}

# Run copy and pasted sed command
$ find "${repo_path}/k8s/operator/helm/templates" -type f -exec sed -i "/HELM_DEPLOY_OLM_PLACEHOLDER/c\\${helm_tmpl_checks}" {} \;

$ git diff
diff --git a/k8s/operator/helm/templates/00_olm.yaml b/k8s/operator/helm/templates/00_olm.yaml
index 381c856..cd4ccafb2 100644
--- a/k8s/operator/helm/templates/00_olm.yaml
+++ b/k8s/operator/helm/templates/00_olm.yaml
@@ -1,4 +1,9 @@
-{{if .Values.deployOLM}}{{- /* HELM_DEPLOY_OLM_PLACEHOLDER */ -}}
+{{- $olmCRDFound := false }}
+{{- $nsLookup := len (lookup "v1" "Namespace" "" "") }}
+{{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}}
+{{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}}
+{{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }}
+{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }}
 {{ if not (eq .Values.olmNamespace .Release.Namespace) }}
 ---
 apiVersion: v1
diff --git a/k8s/operator/helm/templates/csv-deleter.yaml b/k8s/operator/helm/templates/csv-deleter.yaml
index f785a63..5c0efb20e 100644
--- a/k8s/operator/helm/templates/csv-deleter.yaml
+++ b/k8s/operator/helm/templates/csv-deleter.yaml
@@ -1,4 +1,9 @@
-{{if .Values.deployOLM}}{{- /* HELM_DEPLOY_OLM_PLACEHOLDER */ -}}
+{{- $olmCRDFound := false }}
+{{- $nsLookup := len (lookup "v1" "Namespace" "" "") }}
+{{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}}
+{{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}}
+{{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }}
+{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }}
 ---
 apiVersion: batch/v1
 kind: Job

```

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
ddelnano added a commit to ddelnano/pixie that referenced this pull request Aug 6, 2025
… stuck terminating (pixie-io#2059)

Summary: Ensure OLM finalizer runs to prevent px-operator namespace from
being stuck terminating

The helm install process followed by a helm uninstall does not fully
clean up all pixie resources in the v0.1.7 operator release. The OLM
project
[added](operator-framework/operator-lifecycle-manager@f94a5ed)
a csv-cleanup finalizer in
[v0.27.0](https://github.com/operator-framework/operator-lifecycle-manager/releases/tag/v0.27.0)
that causes the px-operator to get stuck in a terminating state if the
`olm` and `px-operator` namespaces are deleted at the same time.

In order to address this, a new Job is introduced within the olm
namespace that triggers the deletion of the olm operator namespace
(px-operator) from a `pre-delete` hook. This bug is not present when OLM
is installed outside of the helm since the finalizer has time to run.
Therefore this job only needs to run if `deployOLM` is set (helm is
managing OLM).

The other alternative I considered was writing another one off utility
similar to the `vizier_deleter` Job. This would have the benefit of
having a small surface area and wouldn't rely on third party images. Let
me know if you have opinions/thoughts on that option or any other
alternatives.

Relevant Issues: pixie-io#1917

Type of change: /kind bug

Test Plan: Verified that the operator dev helm chart from this branch
uninstalls properly
```
$ helm install pixie  pixie-dev-operator/pixie-operator-chart  --version 0.1.7-pre-ddelnano-fix-helm-uninstall-olm-finalizer.0 --set cloudAddr=<cloud_addr> --set deployKey=<deploy_key> --set clusterName='helm-uninstall-test' --namespace pl --create-namespace
NAME: pixie
LAST DEPLOYED: Wed Dec 11 03:13:42 2024
NAMESPACE: pl
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ helm -n pl uninstall pixie
release "pixie" uninstalled

$ kubectl get namespaces | grep 'px-operator\|olm\|pl'
pl                   Active   6m31s
$ kubectl -n pl get all
No resources found in pl namespace.
```
- [x] Verified deployOLM controls if Job is present with `helm template`
```
$ helm template --set deployOLM=true k8s/operator/helm/ | grep -A 5 'Job'
kind: Job
metadata:
  name: csv-deleter
  namespace: olm
  annotations:
    "helm.sh/hook": pre-delete
--
kind: Job
metadata:
  name: vizier-deleter
  annotations:
    "helm.sh/hook": pre-delete
    "helm.sh/hook-delete-policy": hook-succeeded
$ helm template --set deployOLM=false k8s/operator/helm/ | grep -A 5 'Job'
kind: Job
metadata:
  name: vizier-deleter
  annotations:
    "helm.sh/hook": pre-delete
    "helm.sh/hook-delete-policy": hook-succeeded
```

Changelog Message: Fix bug with the v0.1.7 operator helm chart that
would cause a stuck `px-operator` namespace on uninstall

---------

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
GitOrigin-RevId: 9effb34
ddelnano added a commit to ddelnano/pixie that referenced this pull request Aug 6, 2025
…ls (pixie-io#2063)

Summary: Prevent csv-finalizer Job from being included in operator
release yamls

pixie-io#2059 introduced a new Job that fixed helm's uninstall issues caused by
OLM's recent csv-finalizer addition. This properly addressed the helm
issues in pixie-io#1917, however, it
broke the `px` cli install process since the Job wasn't excluded from
the operator release yamls. This results in `px-operator` namespace
termination as the cli is trying to deploy the vizier since the Job runs
unconditionally.

This change also renames the `deleter_role.yaml` file since it seems to
be accidentally included in the operator release yamls. Please see
testing done for how this was determined to be extraneous.

Relevant Issues: pixie-io#1917

Type of change: /kind bug

Test Plan: Verified the following
- [x] `helm template` includes the `csv-finalizer` job
```
# Create dummy Chart.yaml to appease helm
$ helm template --set deployOLM=true  k8s/operator/helm/ | grep 'csv-deleter'
# Source: pixie/templates/csv-deleter.yaml
  name: csv-deleter
```
- [x] `bazel build k8s/operator:operator_templates` no longer includes
the `csv-finalizer` job or the `deleter_role.yaml`
```
$ tar -tf bazel-bin/k8s/operator/operator_templates.tar
yamls/
yamls/crds/
yamls/crds/olm_crd.yaml
yamls/crds/vizier_crd.yaml
yamls/templates/
yamls/templates/00_olm.yaml
yamls/templates/01_px_olm.yaml
yamls/templates/02_catalog.yaml
yamls/templates/03_subscription.yaml
yamls/templates/04_vizier.yaml
```
- [x] Verified deleter role is excluded from `px deploy`'s extracted
yaml.
[This](https://github.com/pixie-io/pixie/blob/9effb349be7a42f8b45ca8fce6cbfdac619349ac/src/utils/shared/artifacts/yamls.go#L165-L170)
code excludes anything that isn't a "crd" file or is isn't numerically
prefixed, which means the deleter role isn't included for `px` cli
deploys
```
$ px deploy --operator_version=0.1.7-pre-z1.0 -e . --deploy_key=<deploy_key>
$ tree pixie_yamls/
pixie_yamls/
├── 00_olm_crd.yaml
├── 01_vizier_crd.yaml
├── 02_olm.yaml
├── 03_px_olm.yaml
├── 04_catalog.yaml
├── 05_subscription.yaml
└── 06_vizier.yaml

1 directory, 7 files
```

---------

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
GitOrigin-RevId: 5c5e9dc
ddelnano added a commit to ddelnano/pixie that referenced this pull request Aug 6, 2025
…e-io#2066)

Summary: Ensure helm release build consistently handles deployOLM toggle

Splitting the csv-deleter Job into its own file in pixie-io#2063 introduced a
bug where the `px-operator` namespace becomes stuck on `helm uninstall`
again. This is because helm's `deployOLM` configuration key is patched
for the [00_olm.yaml
file](https://github.com/pixie-io/pixie/blob/5c5e9dcb259c6548c9ab2c4b31c80d31f5f90124/ci/operator_helm_build_release.sh#L60)
during the release build. When I initially tested in
pixie-io#2059, the csv-deleter Job existed
in the file and received this special treatment, but after pixie-io#2063 it no
longer did.

In order to address this, this change introduces a Go template comment
placeholder that `sed` will look to make it clear that helm operates
with a different conditional check. It also runs this sed command across
all template files and not just the `00_olm.yaml` file. This comment
will be stripped out during the `px` cli's yaml templating and will
leave it's `deployOLM` conditional check intact.

Relevant Issues: pixie-io#1917

Type of change: /kind bug

Test Plan: Ran the sed command and verified that it works
```
# Verify bash variables are set so sed command can be copy and pasted from this PR
$ echo $repo_path
./
$ echo $helm_tmpl_checks
{{- $olmCRDFound := false }} \ {{- $nsLookup := len (lookup "v1" "Namespace" "" "") }} \ {{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}} \ {{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}} \ {{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }}\n{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }}

# Run copy and pasted sed command
$ find "${repo_path}/k8s/operator/helm/templates" -type f -exec sed -i "/HELM_DEPLOY_OLM_PLACEHOLDER/c\\${helm_tmpl_checks}" {} \;

$ git diff
diff --git a/k8s/operator/helm/templates/00_olm.yaml b/k8s/operator/helm/templates/00_olm.yaml
index 381c856..cd4ccafb2 100644
--- a/k8s/operator/helm/templates/00_olm.yaml
+++ b/k8s/operator/helm/templates/00_olm.yaml
@@ -1,4 +1,9 @@
-{{if .Values.deployOLM}}{{- /* HELM_DEPLOY_OLM_PLACEHOLDER */ -}}
+{{- $olmCRDFound := false }}
+{{- $nsLookup := len (lookup "v1" "Namespace" "" "") }}
+{{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}}
+{{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}}
+{{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }}
+{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }}
 {{ if not (eq .Values.olmNamespace .Release.Namespace) }}
 ---
 apiVersion: v1
diff --git a/k8s/operator/helm/templates/csv-deleter.yaml b/k8s/operator/helm/templates/csv-deleter.yaml
index f785a63..5c0efb20e 100644
--- a/k8s/operator/helm/templates/csv-deleter.yaml
+++ b/k8s/operator/helm/templates/csv-deleter.yaml
@@ -1,4 +1,9 @@
-{{if .Values.deployOLM}}{{- /* HELM_DEPLOY_OLM_PLACEHOLDER */ -}}
+{{- $olmCRDFound := false }}
+{{- $nsLookup := len (lookup "v1" "Namespace" "" "") }}
+{{- range $index, $crdLookup := (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "").items -}}{{ if eq $crdLookup.metadata.name "operators.operators.coreos.com"}}{{ $olmCRDFound = true }}{{ end }}{{end}}
+{{ if and (not $olmCRDFound) (not (eq $nsLookup 0))}}{{ fail "CRDs missing! Please deploy CRDs from https://github.com/pixie-io/pixie/tree/main/k8s/operator/helm/crds to continue with deploy." }}{{end}}
+{{- $lookupLen := 0 -}}{{- $opLookup := (lookup "operators.coreos.com/v1" "OperatorGroup" "" "").items -}}{{if $opLookup }}{{ $lookupLen = len $opLookup }}{{ end }}
+{{ if (or (eq (.Values.deployOLM | toString) "true") (and (not (eq (.Values.deployOLM | toString) "false")) (eq $lookupLen 0))) }}
 ---
 apiVersion: batch/v1
 kind: Job

```

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
GitOrigin-RevId: ae80b43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants