Skip to content

Fix job status aggregation logic #35000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

NorthRealm
Copy link
Contributor

@NorthRealm NorthRealm commented Jul 8, 2025

For a run (assume 2 jobs) that has a failed job and a waiting job, the run status should be waiting, as the run is not done yet.

Related #34823

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jul 8, 2025
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label Jul 8, 2025
@NorthRealm
Copy link
Contributor Author

NorthRealm commented Jul 8, 2025

Before patch:

1

@lunny
Copy link
Member

lunny commented Jul 8, 2025

I think #34823 (comment) can resolve the problem.

@NorthRealm
Copy link
Contributor Author

NorthRealm commented Jul 8, 2025

I think #34823 (comment) can resolve the problem.

No.. The picture is what I just observed 🤦‍♂️ (before patch)

I personally see no need to introduce another run status

@NorthRealm
Copy link
Contributor Author

@lunny And for a failed job and a running job as in #34823 , running takes precedence over failure, because the run is not done yet. It's the same thought as for waiting status.

@lunny
Copy link
Member

lunny commented Jul 8, 2025

However, the aggregation logic differs between running/waiting and failure states in two key ways:
1. Failure takes precedence: If any step fails, the overall status should be marked as failure, regardless of the state of other steps.
2. Stopping after failure: If a step fails while others are still running, the system should allow stopping the remaining steps. However, once the overall status is marked as failure, it may no longer be possible to issue a stop.

To address this, I see two possible solutions:
1. Introduce a new aggregation status: running with failure. This status would indicate that some steps are still running while at least one has failed. It would only apply at the aggregate level and not be used for individual steps. I personally prefer this one.
2. Separate aggregation statuses for display and control logic: One status would be used for UI display, and another for cancellation logic. The downside is that the UI might show failure even while some steps are still running, making it unclear whether stopping is still an option.

@NorthRealm
Copy link
Contributor Author

NorthRealm commented Jul 8, 2025

  • Yeah surely any failed Job = Run fail. I'm just worrying that current aggregation could possibly break something, and what I just observed before patch (as in the picture) is the case.
  • If a job fails while others are still running/in waiting state, the system should allow stopping/cancelling remaining jobs.
  • There's already an issue in backend where run marked as failure before all jobs finish. Actually surprised that doesn't spam emails. (Send email on Workflow Run Success/Failure #34982)
    Related Rerun job only when run is done #34970

@NorthRealm NorthRealm marked this pull request as draft July 8, 2025 17:42
@NorthRealm NorthRealm closed this Jul 8, 2025
@NorthRealm NorthRealm deleted the patch6-1 branch July 8, 2025 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants