Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 fix: resolve JSONPath error in caBundle readiness check by adding kubectl_wait_for_query function #1429

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Nov 6, 2024

BY running make kind-deploy against a kind cluster or installing the released script from https://operator-framework.github.io/operator-controller/getting-started/olmv1_getting_started/ the following error is faced:

...
deployment.apps/cert-manager-webhook condition met
deployment.apps/cert-manager-cainjector condition met
deployment.apps/cert-manager condition met
error: jsonpath wait format must be --for=jsonpath='{.status.readyReplicas}'=3

This PR fixes an issue with kubectl wait when used to check the caBundle field in mutatingwebhookconfigurations and validatingwebhookconfigurations.

Solution

This PR introduces the kubectl_wait_for_query function, which replaces kubectl wait for this specific use case. The function repeatedly checks the caBundle field by using kubectl get in a loop, ensuring that the caBundle is populated without relying on status-based conditions. This approach provides a more flexible solution compatible with webhook configurations, bypassing the limitations of kubectl wait.

After the fix:

$ make kind-deploy
/Users/camiladeomacedo/go/bin/controller-gen-v0.16.1 rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/base/crd/bases output:rbac:artifacts:config=config/base/rbac
/Users/camiladeomacedo/go/bin/kustomize-v4.5.7 build config/overlays/cert-manager > operator-controller.yaml
envsubst '$CATALOGD_VERSION,$CERT_MGR_VERSION,$INSTALL_DEFAULT_CATALOGS,$MANIFEST' < scripts/install.tpl.sh | bash -s
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-cluster-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
deployment.apps/cert-manager-webhook condition met
deployment.apps/cert-manager-cainjector condition met
deployment.apps/cert-manager condition met
mutatingwebhookconfigurations/cert-manager-webhook has {.webhooks[0].clientConfig.caBundle}.
validatingwebhookconfigurations/cert-manager-webhook has {.webhooks[0].clientConfig.caBundle}.
namespace/olmv1-system created
customresourcedefinition.apiextensions.k8s.io/clustercatalogs.olm.operatorframework.io created
serviceaccount/catalogd-controller-manager created
role.rbac.authorization.k8s.io/catalogd-leader-election-role created
clusterrole.rbac.authorization.k8s.io/catalogd-manager-role created
clusterrole.rbac.authorization.k8s.io/catalogd-metrics-reader created
clusterrole.rbac.authorization.k8s.io/catalogd-proxy-role created
rolebinding.rbac.authorization.k8s.io/catalogd-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/catalogd-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/catalogd-proxy-rolebinding created
service/catalogd-service created
deployment.apps/catalogd-controller-manager created
certificate.cert-manager.io/olmv1-ca created
certificate.cert-manager.io/catalogd-service-cert created
clusterissuer.cert-manager.io/olmv1-ca created
issuer.cert-manager.io/self-sign-issuer created
mutatingwebhookconfiguration.admissionregistration.k8s.io/catalogd-mutating-webhook-configuration created
Waiting for deployment "catalogd-controller-manager" rollout to finish: 0 of 1 updated replicas are available...
Waiting for deployment "catalogd-controller-manager" rollout to finish: 0 of 1 updated replicas are available...
deployment "catalogd-controller-manager" successfully rolled out
deployment.apps/catalogd-controller-manager condition met
clustercatalog.olm.operatorframework.io/operatorhubio created
clustercatalog.olm.operatorframework.io/operatorhubio condition met
namespace/olmv1-system configured
customresourcedefinition.apiextensions.k8s.io/clusterextensions.olm.operatorframework.io created
serviceaccount/operator-controller-controller-manager created
role.rbac.authorization.k8s.io/operator-controller-leader-election-role created
role.rbac.authorization.k8s.io/operator-controller-manager-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-clusterextension-editor-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-clusterextension-viewer-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-extension-editor-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-extension-viewer-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-manager-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-metrics-reader created
clusterrole.rbac.authorization.k8s.io/operator-controller-proxy-role created
rolebinding.rbac.authorization.k8s.io/operator-controller-leader-election-rolebinding created
rolebinding.rbac.authorization.k8s.io/operator-controller-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/operator-controller-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/operator-controller-proxy-rolebinding created
service/operator-controller-controller-manager-metrics-service created
deployment.apps/operator-controller-controller-manager created
certificate.cert-manager.io/olmv1-ca configured
certificate.cert-manager.io/olmv1-cert created
clusterissuer.cert-manager.io/olmv1-ca unchanged
issuer.cert-manager.io/self-sign-issuer unchanged
deployment.apps/operator-controller-controller-manager condition met

@camilamacedo86 camilamacedo86 requested a review from a team as a code owner November 6, 2024 19:15
Copy link

netlify bot commented Nov 6, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit b118c39
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/672be063b53fa100088e65c9
😎 Deploy Preview https://deploy-preview-1429--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Nov 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.29%. Comparing base (6bda277) to head (bdbef17).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1429      +/-   ##
==========================================
- Coverage   73.42%   73.29%   -0.14%     
==========================================
  Files          42       42              
  Lines        3063     3063              
==========================================
- Hits         2249     2245       -4     
- Misses        641      644       +3     
- Partials      173      174       +1     
Flag Coverage Δ
e2e 55.40% <ø> (+0.06%) ⬆️
unit 52.43% <ø> (-0.14%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@camilamacedo86 camilamacedo86 force-pushed the fix-install-script branch 2 times, most recently from efd97f8 to e7036fc Compare November 6, 2024 19:41
@camilamacedo86 camilamacedo86 changed the title 🐛 fix: add kubectl_wait_for_caBundle to handle caBundle readiness checks 🐛 fix: resolve JSONPath error in caBundle readiness check by adding kubectl_wait_for_caBundle function Nov 6, 2024
@camilamacedo86 camilamacedo86 changed the title 🐛 fix: resolve JSONPath error in caBundle readiness check by adding kubectl_wait_for_caBundle function 🐛 fix: resolve JSONPath error in caBundle readiness check by adding kubectl_wait_for_query function Nov 6, 2024
Comment on lines 44 to 70
function kubectl_wait_for_query() {
manifest=$1
query=$2
timeout=$3
start_time=$(date +%s)
while true; do
val=$(kubectl get "${manifest}" -o jsonpath="${query}" 2>/dev/null || echo "")
if [[ -n "${val}" ]]; then
echo "${manifest} has ${query}."
break
fi
if [[ $(( $(date +%s) - start_time )) -ge ${timeout} ]]; then
echo "Timed out waiting for ${manifest} to have ${query}."
exit 1
fi
sleep 5
done
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this thing be replaced with something along the lines of:

kubectl wait validatingwebhookconfigurations/cert-manager-webhook --for=jsonpath='{.webhooks[0].clientConfig.caBundle}' --timeout=60s

Copy link
Contributor Author

@camilamacedo86 camilamacedo86 Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is the current approach, but unfortunately, it doesn’t work for our use case. ;-) The kubectl wait command doesn’t handle this situation well.

You can easily reproduce the issue by:

My Assumption here is: The issue seems to arise because kubectl wait expects JSONPath conditions related to .status fields (such as readyReplicas), which are applicable for deployments but not for webhook configurations. Since mutatingwebhookconfigurations and validatingwebhookconfigurations lack these .status fields, kubectl wait may fail when it tries to monitor non-existent replica-related conditions. Additionally, it’s possible this issue is linked to cert-manager’s recommendation to use 3 replicas in production environments since we see error: jsonpath wait format must be --for=jsonpath='{.status.readyReplicas}'=3, which may indirectly affect how readiness is handled.

echo "Timed out waiting for ${manifest} to have ${query}."
exit 1
fi
sleep 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change assuming it helps things be less flaky with timing in practice, but

what's the 5 magic number? I think 5s?

  1. I think this should be a variable with a good name: poll_interval_in_seconds?;
  2. But then you are delaying the polling for 5s? Could be a fair amount of extra wait for no benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied your suggestion and set the polling interval to 10 seconds. With a 60-second timeout, it wouldn’t make sense to use a longer interval.

…ectl_wait_for_query function

By running `make kind-deploy` against a kind cluster  or installing the released script from https://operator-framework.github.io/operator-controller/getting-started/olmv1_getting_started/ the following error is faced:

```sh
...
deployment.apps/cert-manager-webhook condition met
deployment.apps/cert-manager-cainjector condition met
deployment.apps/cert-manager condition met
error: jsonpath wait format must be --for=jsonpath='{.status.readyReplicas}'=3
```

This PR fixes an issue with kubectl wait when used to check the caBundle field in `mutatingwebhookconfigurations` and `validatingwebhookconfigurations`.

This PR introduces the `kubectl_wait_for_query` function, which replaces `kubectl wait` for this specific use case. The function repeatedly checks the `caBundle` field by using `kubectl get` in a loop, ensuring that the `caBundle` is populated without relying on status-based conditions. This approach provides a more flexible solution compatible with webhook configurations, bypassing the limitations of `kubectl wait`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants