Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Failed status field for all ArgoCD Controllers #1164

Merged
merged 2 commits into from
Feb 12, 2024

Conversation

Rizwana777
Copy link
Contributor

@Rizwana777 Rizwana777 commented Jan 16, 2024

What type of PR is this?

ApplicationSet controller status is Pending even if its Failed which will be misleading for the user

What does this PR do / why we need it:

Fixed it by adding Failed status field

JIRA LINK - https://issues.redhat.com/browse/GITOPS-2472

@@ -168,6 +168,8 @@ func (r *ReconcileArgoCD) reconcileStatusApplicationSetController(cr *argoproj.A
if deploy.Spec.Replicas != nil {
if deploy.Status.ReadyReplicas == *deploy.Spec.Replicas {
status = "Running"
} else if deploy.Status.ReadyReplicas < *deploy.Spec.Replicas {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Failed state should only be for the case where the Deployment will likely never succeed, for example in the case of ImagePull error (as mentioned by Jaideep on the bug). Whereas here, it appears, the logic will fail if the number of replicas is less than 0 for any reason (for example, the container is starting).

The Failed state is a 'permanent fail', and should only be set when it is likely that it will never succeed. (Jaideep may have other suggestions for cases to chaeck besides image pull fail)

Copy link
Collaborator

@jaideepr97 jaideepr97 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgwest @Rizwana777
Personally, I am of the opinion we should overhaul the entire Argo CD status reporting mechanism we currently have. It seems rather unhelpful to me to just have it say "running" or "pending" without offering any more insight into why that is the case. I would like us to be a bit more comprehensive in how we determine the state of a component and also include a reason for why a component is not running successfully
(something I plan to look into as part of the redesign)

I think we should ideally be looking at the DeploymentConditions in the deployment status to determine what the latest condition is. It also includes the reason if the deployment is failed/unavailable

Copy link
Collaborator

@jaideepr97 jaideepr97 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think image pull errors are 1 of many cases where the deployment could fail
another one could be that the pods fail to come up because they don't meet the security constraints set in place by a cluster (like on openshift)

In such cases I would hope that we can rely on the deploymentconditions to tell us what is wrong and we can relay that directly to the user, as that would be better than us trying to hard code specific checks to determine if the deployment has failed for xyz reason or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaideepr97 you mean instead of checking the state for each pod, we should check for the DeploymentConditions right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it @jaideepr97 thank you

@Rizwana777 Rizwana777 force-pushed the issue-gitops-2472 branch 2 times, most recently from d1361f1 to 8c19725 Compare January 31, 2024 09:57
@Rizwana777
Copy link
Contributor Author

@jgwest Updated the changes , PTAL

@Rizwana777 Rizwana777 force-pushed the issue-gitops-2472 branch 2 times, most recently from dc3631c to 773fc99 Compare February 1, 2024 15:00
@Rizwana777
Copy link
Contributor Author

@jgwest Updated the changes as suggested by Jaideep I am looking into deployment condition to mark the status as Failed, PTAL

Signed-off-by: Rizwana777 <rizwananaaz177@gmail.com>
@Rizwana777 Rizwana777 force-pushed the issue-gitops-2472 branch 3 times, most recently from d4f3d40 to 1784885 Compare February 6, 2024 08:45
Signed-off-by: Rizwana777 <rizwananaaz177@gmail.com>
Copy link
Member

@jgwest jgwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Can change the title of the PR to reflect that it is not just for applicationset

@Rizwana777 Rizwana777 changed the title Add Failed status field in ApplicationSet controlle status Add Failed status field for all ArgoCD Controllers Feb 7, 2024
Copy link
Collaborator

@jaideepr97 jaideepr97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Rizwana777 !

@jaideepr97 jaideepr97 merged commit 20dd48b into argoproj-labs:master Feb 12, 2024
7 checks passed
@svghadi svghadi added the backport-to-redesign Changes which need to be backported to operator-redesign branch label Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-redesign Changes which need to be backported to operator-redesign branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants