Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition deleting resource in terminated namespace #1134

Open
mjallday opened this issue Oct 31, 2024 · 0 comments
Open

Race condition deleting resource in terminated namespace #1134

mjallday opened this issue Oct 31, 2024 · 0 comments
Labels
question Further information is requested

Comments

@mjallday
Copy link

Keywords

finalizer, termination

Problem

We have a resource which is preventing a namespace from being deleted and it's not deleting because it hasn't been created yet.

I found this error in my system

5m47s       Warning   ReconciliationFailed         kustomization/gitops-flux2-sync                                 timeout waiting for: [Namespace/vault-cell-21 status: 'Terminating', Namespace/vault-cell-22 status: 'Terminating']

When I looked closer the namespace was failing to create because a resource hadn't been deleted
image

So I went to look at the kopf resource and it looks like it's state is mixed up.

k -n vault-cell-21 describe healthcheck.vgs.io/dev-sandbox-vault-cell-21-knox-healthcheck

...
Status:
  Kopf:
    Progress:
      Delete:
        Delayed:  2024-10-31T16:11:15.048267+00:00
        Failure:  false
        Message:  Healthcheck is not created yet.
        Purpose:  delete
        Retries:  11105
        Started:  2024-10-25T20:59:39.989982+00:00
        Success:  false
Events:           <none>

Seems it cannot delete because it's "not created yet"

A snippet from my kopf app

[2024-10-31 16:12:45,966] kopf.objects         [DEBUG   ] [vault-cell-22/dev-sandbox-vault-cell-22-larky-compute-healthcheck] Sleeping for 44.885867 seconds for the delayed handlers.
[2024-10-31 16:12:45,967] kopf._core.engines.p [WARNING ] Failed to post an event. Ignoring and continuing. Code: 403. Message: events "kopf-event-" is forbidden: unable to create new content in namespace vault-cell-22 because it is being terminated. Deta
ils: {'name': 'kopf-event-', 'kind': 'events', 'causes': [{'reason': 'NamespaceTerminating', 'message': 'namespace vault-cell-22 is being terminated', 'field': 'metadata.namespace'}]}Event: type='Error', reason='Logging', message="Handler 'delete' failed
temporarily: Healthcheck is not created yet.".

and

[2024-10-31 16:12:46,060] kopf.objects         [DEBUG   ] [vault-test/dev-sandbox-vault-test-forward-http-proxy-healthcheck] Deletion is in progress: {'apiVersion': 'vgs.io/v1', 'kind': 'HealthCheck', 'metadata': {'annotations': {'kopf.zalando.org/delete': '{"started":"2024-10-21T17:10:22.624530+00:00","delayed":"2024-10-31T16:13:30.947812+00:00","purpose":"delete","retries":19072,"success":false,"failure":false,"message":"Healthcheck is not created yet."}', 'meta.helm.sh/release-name': 'test-904a431d', 'meta.helm.sh/release-

How do I avoid this. If the object hasn't been created and a delete comes through I'd expect it to just remove it without getting caught up. The create cannot happen because the namespace is being deleted and the delete cannot occur because there are some objects in there that need to be deleted... Race condition.

Is there a pattern I can follow or can the kopf framework assist in any way here? My controller is very simple, it just has on_create, on_update, and on_delete methods. Can I short circuit the on_create to exit quickly in this case or something similar?

@mjallday mjallday added the question Further information is requested label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant