Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-47489: pkg/gcp/destroy: add waits to prevent leaks during heavy load #9384

Merged

Conversation

patrickdillon
Copy link
Contributor

@patrickdillon patrickdillon commented Jan 22, 2025

We're slowly leaking backend services. This PR adds waits to make sure the operation is done before proceeding. This refactors the destroy code to add a central operation handling function to facilitate this change.

The GCP destroy code repeated a lot of boilerplate operation handling.
This refactors all of that into a single function for increased
maintainability.
Copy link
Contributor

openshift-ci bot commented Jan 22, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 22, 2025
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-47489, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

We're slowly leaking backend services. This PR adds waits to make sure the operation is done before proceeding.

In a draft state, as this is just some early refactoring. Next is to add the wait.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@patrickdillon patrickdillon marked this pull request as ready for review January 22, 2025 17:03
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2025
@openshift-ci openshift-ci bot requested review from barbacbd and jhixson74 January 22, 2025 17:04
@patrickdillon
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 22, 2025
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-47489, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @gpei

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from gpei January 22, 2025 17:29
In OCPBUGS-47489, we see that some resources, particularly global
backend services are being leaked during the destroy process.
Analysis of the creation time stamps for the leaked resources shows
that the resources are clustered together, suggesting the leaks
may occur during periods of heavy load.

During periods of heavy load, the deletion may take longer to
process. This commit addresses the issue by adding waits for all
resource deletion. This ensures ample time to complete destroy calls.
@patrickdillon patrickdillon force-pushed the ocpbugs-47489-global-leak branch from 961b6a9 to bd81ab8 Compare January 22, 2025 17:52
@patrickdillon patrickdillon changed the title OCPBUGS-47489: pkg/destroy/gcp: refactor operation handling OCPBUGS-47489: pkg/gcp/destroy: add waits to prevent leaks during heavy load Jan 22, 2025
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: An error was encountered adding this pull request to the external tracker bugs for bug OCPBUGS-47489 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. failed to update remote link: failed to update link: request failed. Please analyze the request body for more details. Status code: 403: {"errorMessages":["No Link Issue Permission for issue 'OCPBUGS-47489'."],"errors":{}}

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

In response to this:

We're slowly leaking backend services. This PR adds waits to make sure the operation is done before proceeding. This refactors the destroy code to add a central operation handling function to facilitate this change.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@patrickdillon patrickdillon force-pushed the ocpbugs-47489-global-leak branch from bd81ab8 to 82b994d Compare January 22, 2025 17:57
@patrickdillon
Copy link
Contributor Author

Pushed some changes to (hopefully) appease the linter.

@openshift-ci-robot
Copy link
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-47489. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

We're slowly leaking backend services. This PR adds waits to make sure the operation is done before proceeding. This refactors the destroy code to add a central operation handling function to facilitate this change.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@patrickdillon patrickdillon reopened this Jan 22, 2025
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-47489, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @gpei

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

We're slowly leaking backend services. This PR adds waits to make sure the operation is done before proceeding. This refactors the destroy code to add a central operation handling function to facilitate this change.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Linter was complaining about existing code that used the magic
string "DONE". This converts the string to a constant.
@patrickdillon patrickdillon force-pushed the ocpbugs-47489-global-leak branch from 82b994d to c78c0a1 Compare January 22, 2025 20:45
Copy link
Contributor

openshift-ci bot commented Jan 23, 2025

@patrickdillon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-secureboot c78c0a1 link false /test e2e-gcp-secureboot
ci/prow/e2e-gcp-ovn-xpn c78c0a1 link false /test e2e-gcp-ovn-xpn
ci/prow/e2e-azure-ovn-resourcegroup c78c0a1 link false /test e2e-azure-ovn-resourcegroup
ci/prow/e2e-vsphere-host-groups-ovn-custom-no-upgrade c78c0a1 link false /test e2e-vsphere-host-groups-ovn-custom-no-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@barbacbd barbacbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 24, 2025
@barbacbd
Copy link
Contributor

/label acknowledge-critical-fixes-only

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Jan 24, 2025
@patrickdillon
Copy link
Contributor Author

/approve
/cherry-pick release-4.18

@openshift-cherrypick-robot

@patrickdillon: once the present PR merges, I will cherry-pick it on top of release-4.18 in a new PR and assign it to you.

In response to this:

/approve
/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

openshift-ci bot commented Jan 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 24, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit d224d2c into openshift:main Jan 25, 2025
20 of 24 checks passed
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: Jira Issue OCPBUGS-47489: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-47489 has been moved to the MODIFIED state.

In response to this:

We're slowly leaking backend services. This PR adds waits to make sure the operation is done before proceeding. This refactors the destroy code to add a central operation handling function to facilitate this change.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot

@patrickdillon: new pull request created: #9402

In response to this:

/approve
/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-altinfra
This PR has been included in build ose-installer-altinfra-container-v4.19.0-202501250436.p0.gd224d2c.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-terraform-providers
This PR has been included in build ose-installer-terraform-providers-container-v4.19.0-202501250436.p0.gd224d2c.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-baremetal-installer
This PR has been included in build ose-baremetal-installer-container-v4.19.0-202501250436.p0.gd224d2c.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-artifacts
This PR has been included in build ose-installer-artifacts-container-v4.19.0-202501250436.p0.gd224d2c.assembly.stream.el9.
All builds following this will include this PR.

@patrickdillon
Copy link
Contributor Author

__

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants