Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add reconciliation retries for CRs #423

Merged
merged 6 commits into from
May 22, 2024
Merged

Conversation

mjnagel
Copy link
Contributor

@mjnagel mjnagel commented May 22, 2024

Description

Adds re-tries to Package CR status + logic to increment and handle retries. Currently will attempt package reconcile 5x before failing.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Other (security config, docs update, etc)

Checklist before merging

@mjnagel mjnagel self-assigned this May 22, 2024
@mjnagel mjnagel marked this pull request as ready for review May 22, 2024 21:50
@mjnagel mjnagel merged commit 424b57b into main May 22, 2024
7 checks passed
@mjnagel mjnagel deleted the reconcile-retry-addition branch May 22, 2024 21:58
mjnagel pushed a commit that referenced this pull request May 23, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.22.0](v0.21.1...v0.22.0)
(2024-05-22)


### Features

* add `expose` service entry for internal cluster traffic
([#356](#356))
([1bde4cc](1bde4cc))
* add reconciliation retries for CRs
([#423](#423))
([424b57b](424b57b))
* uds common renovate config
([#391](#391))
([035786c](035786c))
* uds core docs
([#414](#414))
([a35ca7b](a35ca7b))


### Bug Fixes

* mismatched exemption/policy for DropAllCapabilities
([#384](#384))
([d8ec278](d8ec278))
* pepr mutation annotation overwrite
([#385](#385))
([6e56b2a](6e56b2a))
* renovate config grouping, test-infra
([#411](#411))
([05fd407](05fd407))
* renovate pepr comment
([#410](#410))
([a825388](a825388))


### Miscellaneous

* **deps:** update keycloak
([#390](#390))
([3e82c4e](3e82c4e))
* **deps:** update keycloak to v24.0.4
([#397](#397))
([c0420ea](c0420ea))
* **deps:** update keycloak to v24.0.4
([#402](#402))
([e454576](e454576))
* **deps:** update neuvector to v9.4
([#381](#381))
([20d4170](20d4170))
* **deps:** update pepr to 0.31.0
([#360](#360))
([fbd61ea](fbd61ea))
* **deps:** update prometheus-stack
([#348](#348))
([49cb11a](49cb11a))
* **deps:** update prometheus-stack
([#392](#392))
([2e656f5](2e656f5))
* **deps:** update uds to v0.10.4
([#228](#228))
([1750b23](1750b23))
* **deps:** update uds-k3d to v0.6.0
([#398](#398))
([288f009](288f009))
* **deps:** update velero
([#350](#350))
([e7cb33e](e7cb33e))
* **deps:** update zarf to v0.33.2
([#394](#394))
([201a37b](201a37b))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@mjnagel
Copy link
Contributor Author

mjnagel commented Jul 1, 2024

Posting additional context on this shift retroactively since this change was rather significant and resulted in a few issues.

Retries were introduced here to account for a specific error we ran into during pepr upgrades/pods cycling. With the introduction of service monitor generation in the operator, we have a flow where the watcher pod generates a service monitor that the admission pods then mutate. Across upgrades we encountered intermittent failures due to webhook timeouts - the watcher would fail to apply the service monitors, erroring out reconciliation of a Package on something that should be retry-able (thinking about normal helm/zarf flow, multiple apply attempts would be made).

Rather than introduce a targeted retry for just the servicemonitor behavior we decided it would potentially solve more intermittent issues (ex: intermittent networking related problems) if we just did a generic 5x retry on all Packages. This was reviewed synchronously and tested against a few scenarios where retries did resolve issues. For history sake linking bugs introduced here:

rjferguson21 pushed a commit that referenced this pull request Jul 11, 2024
## Description

Adds re-tries to Package CR status + logic to increment and handle
retries. Currently will attempt package reconcile 5x before failing.

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor Guide
Steps](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)(https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md#submitting-a-pull-request)
followed
rjferguson21 pushed a commit that referenced this pull request Jul 11, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.22.0](v0.21.1...v0.22.0)
(2024-05-22)


### Features

* add `expose` service entry for internal cluster traffic
([#356](#356))
([1bde4cc](1bde4cc))
* add reconciliation retries for CRs
([#423](#423))
([424b57b](424b57b))
* uds common renovate config
([#391](#391))
([035786c](035786c))
* uds core docs
([#414](#414))
([a35ca7b](a35ca7b))


### Bug Fixes

* mismatched exemption/policy for DropAllCapabilities
([#384](#384))
([d8ec278](d8ec278))
* pepr mutation annotation overwrite
([#385](#385))
([6e56b2a](6e56b2a))
* renovate config grouping, test-infra
([#411](#411))
([05fd407](05fd407))
* renovate pepr comment
([#410](#410))
([a825388](a825388))


### Miscellaneous

* **deps:** update keycloak
([#390](#390))
([3e82c4e](3e82c4e))
* **deps:** update keycloak to v24.0.4
([#397](#397))
([c0420ea](c0420ea))
* **deps:** update keycloak to v24.0.4
([#402](#402))
([e454576](e454576))
* **deps:** update neuvector to v9.4
([#381](#381))
([20d4170](20d4170))
* **deps:** update pepr to 0.31.0
([#360](#360))
([fbd61ea](fbd61ea))
* **deps:** update prometheus-stack
([#348](#348))
([49cb11a](49cb11a))
* **deps:** update prometheus-stack
([#392](#392))
([2e656f5](2e656f5))
* **deps:** update uds to v0.10.4
([#228](#228))
([1750b23](1750b23))
* **deps:** update uds-k3d to v0.6.0
([#398](#398))
([288f009](288f009))
* **deps:** update velero
([#350](#350))
([e7cb33e](e7cb33e))
* **deps:** update zarf to v0.33.2
([#394](#394))
([201a37b](201a37b))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants