Skip to content

AUTH-543: Add optional operand deletion condition #1902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

liouk
Copy link
Member

@liouk liouk commented Dec 3, 2024

There might be cases (as demonstrated in OCPBUGS-44937) where we might want to gracefully delete the operand workload of the workload controller, and keep the operator status available (instead of unavailable or degraded).

This PR adds an optional way of specifying a deletion condition which will trigger the deletion of the operand gracefully, keeping the operator's status as Available=True. To avoid breaking changes, I've added new setup functions to be used for wiring this particular case.

This is needed in the scope of openshift/cluster-authentication-operator#740.

Proof PRs:

@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 3, 2024
@openshift-ci-robot
Copy link

@liouk: This pull request references Jira Issue OCPBUGS-44937, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

There might be cases (as demonstrated in OCPBUGS-44937) where we might want to gracefully delete the operand workload of the workload controller, and keep the operator status available (instead of unavailable or degraded).

This PR adds an optional way of specifying a deletion condition which will trigger the deletion of the operand gracefully, keeping the operator's status as Available=True.

This is needed in the scope of openshift/cluster-authentication-operator#740.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 3, 2024
@openshift-ci openshift-ci bot requested review from p0lyn0mial and tkashem December 3, 2024 09:55
@liouk liouk force-pushed the workload-deletion-condition branch 3 times, most recently from 2daf3c8 to ee16aa0 Compare December 3, 2024 12:21
@liouk liouk changed the title WIP: OCPBUGS-44937: Add optional operand deletion condition OCPBUGS-44937: Add optional operand deletion condition Dec 3, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 3, 2024
@liouk liouk changed the title OCPBUGS-44937: Add optional operand deletion condition NO-JIRA: Add optional operand deletion condition Dec 3, 2024
@openshift-ci-robot openshift-ci-robot removed jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 3, 2024
@openshift-ci-robot
Copy link

@liouk: This pull request explicitly references no jira issue.

In response to this:

There might be cases (as demonstrated in OCPBUGS-44937) where we might want to gracefully delete the operand workload of the workload controller, and keep the operator status available (instead of unavailable or degraded).

This PR adds an optional way of specifying a deletion condition which will trigger the deletion of the operand gracefully, keeping the operator's status as Available=True.

This is needed in the scope of openshift/cluster-authentication-operator#740.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@liouk liouk force-pushed the workload-deletion-condition branch from ee16aa0 to a9ce87e Compare December 3, 2024 14:48
}
}()

if _, err := c.deploymentLister.Deployments(c.targetNamespace).Get(workloadName); err != nil && !apierrors.IsNotFound(err) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Get() call necessary before attempting the Delete() call? What does the extra call buy us?

Copy link
Member Author

@liouk liouk Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Get() happens on the lister (cache), which means that if the deployment has already been deleted, it'll save us an extra API call to Delete(). This will be happening on every sync.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I totally missed that the Get() would be on the cache - that makes sense :). Thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be beneficial to add some unit tests for the new deletion behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit tests would end up using mocks for most of the stuff that the deletion does; but on second thought it might be beneficial to test the operator status, so I'll add some 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to import and use this change in one of the consumers (e.g. cluster-authentication-operator) and show that the CI is passing there.

@liouk liouk force-pushed the workload-deletion-condition branch from a9ce87e to 520c3d2 Compare December 4, 2024 11:41
Comment on lines 443 to 452
deploymentAvailableCondition = deploymentAvailableCondition.
WithStatus(operatorv1.ConditionFalse).
WithReason("DeletionError")
Copy link

@everettraven everettraven Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include a message in all conditions? I'm not familiar with how these conditions have been constructed in the past, but I have seen logs that say not setting the message in a condition will eventually be fatal:

W1203 04:13:07.671272       1 dynamic_operator_client.go:355] .status.conditions["APIServerDeploymentAvailable"].message is missing; this will eventually be fatal

This was pulled from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.18-ocp-e2e-ovn-remote-libvirt-s390x/1863791759423705088/artifacts/ocp-e2e-ovn-remote-libvirt-s390x/gather-extra/artifacts/pods/openshift-apiserver-operator_openshift-apiserver-operator-dfc988b89-rsh2n_openshift-apiserver-operator.log

Even though it would be a repetitive message in this case, with ^ in mind maybe it is worth it to future-proof this implementation as much as we can for when that does get flipped to fatal?

//
// the "deletionConditionFn" will be used to check whether the workload specified by the
// returned name which is part of targetNamespace must be deleted
func NewControllerWithDeletion(instanceName, operatorNamespace, targetNamespace, targetOperandVersion, operandNamePrefix, conditionsPrefix string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there sufficient benefit of creating a 2nd constructor for this? We could just pass the deletion option as nil in the normal use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still applies, I think we do not need any change in the constructor. Let's drive such logic through the delegate.

if conditionMet, workload, err := c.deletionConditionFn(); err != nil {
return err
} else if conditionMet {
return c.deleteWorkload(ctx, workload)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the delegate be responsible for the lifecycle of the Deployment? I am not sure if it makes sense to fragment the logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still applies, why do we have to manage the deletion in here?

return c.deleteWorkload(ctx, workload)
}
}

if fulfilled, err := c.delegate.PreconditionFulfilled(ctx); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be better to let the delegate communicate everything as we do with the preconditions?

@@ -295,6 +295,46 @@ func (cs *APIServerControllerSet) WithWorkloadController(
return cs
}

func (cs *APIServerControllerSet) WithWorkloadControllerWithDeletion(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we let delegate controller we do not need another method

@@ -356,6 +424,55 @@ func (c *Controller) updateOperatorStatus(ctx context.Context, previousStatus *o
return nil
}

func (c *Controller) deleteWorkload(ctx context.Context, workloadName string) (err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to split it from the updateOperatorStatus reconciliation logic? It would be nicer to keep the concern in the same place and consider all 4 conditions.

I do not have a problem with calling a util function in updateOperatorStatus if needed though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still applies, if we need to manage the status/conditions during scale down of the workload, it would be better to do it in a single place (updateOperatorStatus)

if _, getErr := c.deploymentLister.Deployments(c.targetNamespace).Get(workloadName); getErr != nil && !apierrors.IsNotFound(getErr) {
deploymentAvailableCondition = deploymentAvailableCondition.
WithStatus(operatorv1.ConditionFalse).
WithReason("DeletionError")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this isn't a deletion error

Comment on lines 467 to 478
deploymentAvailableCondition = deploymentAvailableCondition.
WithStatus(operatorv1.ConditionTrue).
WithReason("AsExpected")
workloadDegradedCondition = workloadDegradedCondition.
WithStatus(operatorv1.ConditionFalse)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value of preserving these conditions (Available=True, Degraded=False) if no workload exists for them?

Or are we interested in measuring the availability of OIDC provider? Is it possible? And even then this probably isn't the right place to indicate it, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to import and use this change in one of the consumers (e.g. cluster-authentication-operator) and show that the CI is passing there.

@liouk liouk changed the title NO-JIRA: Add optional operand deletion condition AUTH-543: Add optional operand deletion condition Dec 5, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 5, 2024

@liouk: This pull request references AUTH-543 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.19." or "openshift-4.19.", but it targets "openshift-4.18" instead.

In response to this:

There might be cases (as demonstrated in OCPBUGS-44937) where we might want to gracefully delete the operand workload of the workload controller, and keep the operator status available (instead of unavailable or degraded).

This PR adds an optional way of specifying a deletion condition which will trigger the deletion of the operand gracefully, keeping the operator's status as Available=True. To avoid breaking changes, I've added new setup functions to be used for wiring this particular case.

This is needed in the scope of openshift/cluster-authentication-operator#740.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@liouk
Copy link
Member Author

liouk commented Dec 5, 2024

/jira refresh

@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 5, 2024

@liouk: This pull request references AUTH-543 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.19." or "openshift-4.19.", but it targets "openshift-4.18" instead.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@liouk liouk force-pushed the workload-deletion-condition branch from 520c3d2 to 8367a93 Compare December 5, 2024 17:58
deploymentDegradedCondition := applyoperatorv1.OperatorCondition().
WithType(fmt.Sprintf("%sDeploymentDegraded", c.conditionsPrefix))
if removeConditions {
jsonPatch := v1helpers.RemoveConditionsJSONPatch(previousStatus, []string{typeAvailable, typeDegraded, typeProgressing, typeWorkloadDegraded})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we cannot use SSA for removing conditions, but I am not sure if patch is better than v1helpers.UpdateStatus here. Would be good to add an additional opinion on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of the patch here in my opinion is that we're only adding to the patch the specific conditions that we want to remove, so it's more concise -- we won't have to manage the whole status object to perform the update, so maybe it's less prone to mistakes.

What would you think @bertinatto?


deploymentDegradedCondition := applyoperatorv1.OperatorCondition().
WithType(fmt.Sprintf("%sDeploymentDegraded", c.conditionsPrefix))
if removeConditions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check if workload == nil to make sure the delegate has deleted (and not recreated) the workload


deploymentDegradedCondition := applyoperatorv1.OperatorCondition().
WithType(fmt.Sprintf("%sDeploymentDegraded", c.conditionsPrefix))
if removeConditions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even the workload should be gone and we should not get any errs, we should at least log them if they passed on by the delegate

deploymentAvailableCondition := applyoperatorv1.OperatorCondition().WithType(typeAvailable)
workloadDegradedCondition := applyoperatorv1.OperatorCondition().WithType(typeWorkloadDegraded)
deploymentDegradedCondition := applyoperatorv1.OperatorCondition().WithType(typeDegraded)
deploymentProgressingCondition := applyoperatorv1.OperatorCondition().WithType(typeProgressing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to still consider preconditions even when the workload will be deleted later. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the delete would happen during the delegate's sync, we shouldn't normally reach the point of removing the conditions if preconditions are failing. But you are right, we should safe-guard against this -- I'll add a check before removing conditions.

if conditionMet, workload, err := c.deletionConditionFn(); err != nil {
return err
} else if conditionMet {
return c.deleteWorkload(ctx, workload)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still applies, why do we have to manage the deletion in here?

//
// the "deletionConditionFn" will be used to check whether the workload specified by the
// returned name which is part of targetNamespace must be deleted
func NewControllerWithDeletion(instanceName, operatorNamespace, targetNamespace, targetOperandVersion, operandNamePrefix, conditionsPrefix string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still applies, I think we do not need any change in the constructor. Let's drive such logic through the delegate.

@@ -356,6 +424,55 @@ func (c *Controller) updateOperatorStatus(ctx context.Context, previousStatus *o
return nil
}

func (c *Controller) deleteWorkload(ctx context.Context, workloadName string) (err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still applies, if we need to manage the status/conditions during scale down of the workload, it would be better to do it in a single place (updateOperatorStatus)

@liouk liouk force-pushed the workload-deletion-condition branch from 8367a93 to f07c37b Compare December 10, 2024 10:52
liouk added a commit to liouk/cluster-authentication-operator that referenced this pull request Dec 10, 2024
Copy link
Member

@atiratree atiratree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general design looks nice IMO. It just needs a little more polishing.

And a proof PR as menioned by @liouk .

Comment on lines 585 to 588
removeAtIndex := i
if !jsonPatch.IsEmpty() {
removeAtIndex = removeAtIndex - removedCount
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Can't we do just

Suggested change
removeAtIndex := i
if !jsonPatch.IsEmpty() {
removeAtIndex = removeAtIndex - removedCount
}
removeAtIndex := i - removedCount

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- that code pre-existed and I just refactored it into a func.

Comment on lines 614 to 617
removeAtIndex := i
if !jsonPatch.IsEmpty() {
removeAtIndex = removeAtIndex - removedCount
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here

@@ -565,3 +566,63 @@ func IsUpdatingTooLong(operatorStatus *operatorv1.OperatorStatus, progressingCon
progressing := FindOperatorCondition(operatorStatus.Conditions, progressingConditionType)
return progressing != nil && progressing.Status == operatorv1.ConditionTrue && time.Now().After(progressing.LastTransitionTime.Add(progressingConditionTimeout))
}

func RemoveConditionsJSONPatch(operatorStatus *operatorv1.OperatorStatus, conditionTypesToRemove []string, jsonPatch *jsonpatch.PatchSet) *jsonpatch.PatchSet {
if operatorStatus == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if operatorStatus == nil {
if operatorStatus == nil || len(conditionTypesToRemove) == 0 {

// generation, a bool indicating whether the operator status conditions must be removed (e.g. in case the
// delegate intentionally deletes the workload), two strings indicating the name & namespace of the workload
// that must be cleaned up from the operator status (respective to the conditions to be removed) and a list
// errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// errors.
// of errors.

// delegate intentionally deletes the workload), two strings indicating the name & namespace of the workload
// that must be cleaned up from the operator status (respective to the conditions to be removed) and a list
// errors.
Sync(ctx context.Context, controllerContext factory.SyncContext) (*appsv1.Deployment, bool, bool, string, string, []error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be also useful to name the return values since we have that many of them

@@ -259,6 +268,17 @@ func (c *StatusSyncer) syncStatusVersions(clusterOperatorObj *configv1.ClusterOp
}
}

if c.removeEmptyVersions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document the reason behind this part here.

// delegate intentionally deletes the workload), two strings indicating the name & namespace of the workload
// that must be cleaned up from the operator status (respective to the conditions to be removed) and a list
// errors.
Sync(ctx context.Context, controllerContext factory.SyncContext) (*appsv1.Deployment, bool, bool, string, string, []error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should document that it is preferable to activate WithEmptyVersionRemoval when removeConditions/Generations is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, that's not very obvious.

@@ -672,6 +683,46 @@ func TestUpdateOperatorStatus(t *testing.T) {
return areCondidtionsEqual(expectedConditions, actualStatus.Conditions)
},
},
{
name: "the delegate controller deletes its operand workload and we remove the conditions",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but maybe we could also test the generations while we are at it

"reflect"
"regexp"
"strings"
"testing"
"time"

clocktesting "k8s.io/utils/clock/testing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: k8s.io group is also below

Comment on lines 403 to 410
Name: "test",
Namespace: "test",
},
{
Group: "apps2",
Resource: "deployments2",
Name: "test2",
Namespace: "test2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we distinguish the name x namespace to test they do not get mixed up in the invocation?

@liouk liouk force-pushed the workload-deletion-condition branch from beb6d28 to 304af12 Compare June 19, 2025 12:43
@liouk
Copy link
Member Author

liouk commented Jun 19, 2025

Based on the most recent review, I also decided to simplify the generation of multiple json patches by creating a Merge() func instead of injecting the patch to use. This simplifies the usage and error checking around the injected patch, and allows all clients of jsonpatch.PatchSet to leverage the functionality.

Sync(ctx context.Context, controllerContext factory.SyncContext) (
delegateWorkload *appsv1.Deployment,
operatorConfigAtHighestGeneration bool,
removeConditions bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can then generalize that since the delegate (and this controller) doesn't have to mention the lower level operations done by this controller

Suggested change
removeConditions bool,
removeWorkload bool,


workloadDegradedCondition := applyoperatorv1.OperatorCondition().
WithType(fmt.Sprintf("%sWorkloadDegraded", c.conditionsPrefix))
if len(deletedWorkloadName) > 0 && len(deletedWorkloadNamespace) > 0 && removeConditions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(deletedWorkloadName) > 0 && len(deletedWorkloadNamespace) > 0 && removeConditions {
if removeWorkload && len(deletedWorkloadName) > 0 && len(deletedWorkloadNamespace) > 0 {

reads better IMO :)

// that must be cleaned up from the operator status (respective to the conditions to be deleted) and a list
// of errors.
//
// When a workload gets deleted, not only its respective status conditions will removed, but also its
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// When a workload gets deleted, not only its respective status conditions will removed, but also its
// When a workload gets deleted, not only its respective status conditions will be removed, but also its

WithName(workload.Name).
WithLastGeneration(workload.Generation),
)
deploymentAvailableCondition := applyoperatorv1.OperatorCondition().WithType(typeAvailable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, easier to read in two separate block

Sync(ctx context.Context, controllerContext factory.SyncContext) (*appsv1.Deployment, bool, []error)
//
// It returns a reference to the workload, a bool indicating whether the operator config is at the highest
// generation, a bool indicating whether the operator status conditions must be removed (e.g. in case the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// generation, a bool indicating whether the operator status conditions must be removed (e.g. in case the
// generation, a bool indicating whether the workload reference (e.g. conditions, generations) should be removed from operator status (e.g. in case the

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe update the rest a bit to match that?

@liouk liouk force-pushed the workload-deletion-condition branch from 304af12 to 4e9fee1 Compare June 20, 2025 08:46
Comment on lines 53 to 54
deletedWorkloadName string,
deletedWorkloadNamespace string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
deletedWorkloadName string,
deletedWorkloadNamespace string,
removedWorkloadName string,
removedWorkloadNamespace string,

for consistency with removeWorkload

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naming convention was intentional -- I chose deleted for workload name & namespace as it is the right verb for the resource action, while I chose removeWorkload for the operator status references because we'll be removing elements from lists/fields instead of deleting any resources.

If you still think this is more confusing than helpful, I'm happy to change all to use "remove".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose deleted for workload name & namespace as it is the right verb for the resource action,

I feel like this is more of an implementation detail for the delegate to work out. The workload controller only takes care if the workload is/was present. And if it never was, the delete might have never been called.

previousStatus *operatorv1.OperatorStatus,
workload *appsv1.Deployment,
operatorConfigAtHighestGeneration, preconditionsReady, removeWorkload bool,
deletedWorkloadName, deletedWorkloadNamespace string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ elsewhere


workloadDegradedCondition := applyoperatorv1.OperatorCondition().
WithType(fmt.Sprintf("%sWorkloadDegraded", c.conditionsPrefix))
if removeWorkload && len(deletedWorkloadName) > 0 && len(deletedWorkloadNamespace) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm maybe it would be safer to output an error if removeWorkload is true, but name/ns are not present?

@liouk liouk force-pushed the workload-deletion-condition branch from 4e9fee1 to f85b0e4 Compare June 20, 2025 09:30
WithType(fmt.Sprintf("%sWorkloadDegraded", c.conditionsPrefix))
if removeWorkload {
if len(removedWorkloadName) == 0 || len(removedWorkloadNamespace) == 0 {
return kerrors.NewAggregate(append(errs, fmt.Errorf("workload marked as removed but no name and/or namespace provided (name=%s/ns=%s)", removedWorkloadName, removedWorkloadNamespace)))
Copy link
Member

@atiratree atiratree Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return kerrors.NewAggregate(append(errs, fmt.Errorf("workload marked as removed but no name and/or namespace provided (name=%s/ns=%s)", removedWorkloadName, removedWorkloadNamespace)))
return kerrors.NewAggregate(append(errs, fmt.Errorf("workload marked as removed but no name and/or namespace provided (name=%s, ns=%s)", removedWorkloadName, removedWorkloadNamespace)))

This might be confusing when comparing to conventional ns/name keys.

liouk added a commit to liouk/cluster-authentication-operator that referenced this pull request Jun 20, 2025
@liouk liouk force-pushed the workload-deletion-condition branch from f85b0e4 to bd8b874 Compare June 20, 2025 09:46
Also, refactor update status func to separate between updating the status and also
the version and generations.
@liouk liouk force-pushed the workload-deletion-condition branch from bd8b874 to b03ce0a Compare June 20, 2025 09:47
@atiratree
Copy link
Member

/lgtm

#1902 (comment) still applies

liouk added a commit to liouk/cluster-authentication-operator that referenced this pull request Jun 20, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 20, 2025
Copy link
Contributor

openshift-ci bot commented Jun 20, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: atiratree, liouk
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liouk
Copy link
Member Author

liouk commented Jun 20, 2025

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 20, 2025
Copy link
Contributor

openshift-ci bot commented Jun 20, 2025

@liouk: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 23, 2025

@liouk: This pull request references AUTH-543 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

There might be cases (as demonstrated in OCPBUGS-44937) where we might want to gracefully delete the operand workload of the workload controller, and keep the operator status available (instead of unavailable or degraded).

This PR adds an optional way of specifying a deletion condition which will trigger the deletion of the operand gracefully, keeping the operator's status as Available=True. To avoid breaking changes, I've added new setup functions to be used for wiring this particular case.

This is needed in the scope of openshift/cluster-authentication-operator#740.

Proof PRs:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@liouk
Copy link
Member Author

liouk commented Jun 23, 2025

/jira refresh

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 23, 2025

@liouk: This pull request references AUTH-543 which is a valid jira issue.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 23, 2025

@liouk: This pull request references AUTH-543 which is a valid jira issue.

In response to this:

There might be cases (as demonstrated in OCPBUGS-44937) where we might want to gracefully delete the operand workload of the workload controller, and keep the operator status available (instead of unavailable or degraded).

This PR adds an optional way of specifying a deletion condition which will trigger the deletion of the operand gracefully, keeping the operator's status as Available=True. To avoid breaking changes, I've added new setup functions to be used for wiring this particular case.

This is needed in the scope of openshift/cluster-authentication-operator#740.

Proof PRs:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants