Skip to content

Fix job status aggregation logic #34823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nienjiuntai
Copy link
Contributor

Fix #34821 by checking if all the jobs are done before set status to fail
Image

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jun 23, 2025
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label Jun 23, 2025
@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 23, 2025

You can add (or fix) a test case here: run_job_status_test.go

@TheFox0x7
Copy link
Contributor

specifically modify one here:

{[]Status{StatusFailure, StatusWaiting}, StatusFailure},
{[]Status{StatusFailure, StatusRunning}, StatusFailure},
{[]Status{StatusFailure, StatusBlocked}, StatusFailure},

I'm not sure that this is a right fix, more that github does it weirdly in this case. Unless entire workflow can recover from the failure (which I don't know if it's possible) I think we do this correctly now.
But I guess it kind of depends what signal you're more interested in - is it the pass/fail or are there any jobs still running/pending.

I don't think the fix is complete though as there's still the case of the waiting and blocked. Logically looking if we consider running as not a failure, then waiting and blocked shouldn't be as well as they are before the running state.
It wouldn't make much sense to have overall status be failure when we have failure+blocked but running when the block converts to failure.

- Updated the condition for returning `StatusFailure` to ensure it only occurs when all jobs are done.
@nienjiuntai nienjiuntai force-pushed the fix-job-status-aggregation-logic branch from 182cdd3 to 8726113 Compare June 23, 2025 16:01
@wxiaoguang
Copy link
Contributor

Hmm, I also have the question: should we report the failure as soon as possible?

IMO the old behavior (reporting failure as soon as possible) is better to end users.

If a job fails, then the final status must be also "failed". Reporting failure to users early can tell them "there is something" in first time, no need to wait all jobs finishes.

@wxiaoguang
Copy link
Contributor

You see, the comment is still there ...

image

@lunny
Copy link
Member

lunny commented Jun 23, 2025

Or maybe we can introduce a new status like running with part failure.

@TheFox0x7
Copy link
Contributor

I think that would be the best solution. That way UI can indicate that run has already failed BUT there's something still pending.

@nienjiuntai
Copy link
Contributor Author

You see, the comment is still there ...

image

Yes, I'm aware this comment, but it doesn't match the behavior tested in Github Action
Image

Fail fast property will cause a weired behavior that user cannot cancel a workflow or get current running workflows correctly by filter after a job fail.

@wxiaoguang
Copy link
Contributor

Fail fast property will cause a weired behavior that user cannot cancel a workflow or get current running workflows correctly by filter after a job fail.

It sounds a strong reason.

So I think we can write the details into comment, and use this solution, and backport. Leave more improvements to the future.

@wxiaoguang wxiaoguang added this to the 1.25.0 milestone Jun 23, 2025
@wxiaoguang wxiaoguang added type/bug backport/v1.24 This PR should be backported to Gitea 1.24 labels Jun 23, 2025
Copy link
Member

@lunny lunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#34823 (comment)

This could be another PR

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Jun 23, 2025
Signed-off-by: wxiaoguang <wxiaoguang@gmail.com>
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Jun 24, 2025
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Signed-off-by: wxiaoguang <wxiaoguang@gmail.com>
@wxiaoguang wxiaoguang merged commit a789a8c into go-gitea:main Jun 24, 2025
26 checks passed
@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 24, 2025

I don't think the fix is complete though as there's still the case of the waiting and blocked. Logically looking if we consider running as not a failure, then waiting and blocked shouldn't be as well as they are before the running state. It wouldn't make much sense to have overall status be failure when we have failure+blocked but running when the block converts to failure.

Merged now, let's see whether we should continue changing the priority, then never make the "failure" win other status.

Maybe something related:

  1. Should we "fix the bug" that a "failure" task isn't able to be cancelled or rerun? Then still make the the status "fail fast" (win other status)
  2. Should we really introduce a bunch of new statuses like failed-running, or failed-waiting, etc. Or introduce a combined status mechanism like (failure, running)?
    • Either solution seems too complex than it should be

Open for discussion.

GiteaBot pushed a commit to GiteaBot/gitea that referenced this pull request Jun 24, 2025
@GiteaBot GiteaBot added the backport/done All backports for this PR have been created label Jun 24, 2025
wxiaoguang added a commit that referenced this pull request Jun 24, 2025
Backport #34823 by nienjiuntai

Co-authored-by: JIUN-TAI NIEN <44364165+nienjiuntai@users.noreply.github.com>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
@lunny
Copy link
Member

lunny commented Jun 24, 2025

We just need only to introduce running-failure, no other status is necessary.
Looks like two new statuses need to be introduced if that's the right direction.

@wxiaoguang
Copy link
Contributor

We just need only to introduce running-failure, no other status is necessary.

Why? "waiting" is similar to "running" in this case and the "waiting" will be come "running" sooner or later.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 24, 2025

more that github does it weirdly in this case.

Hmm, it's also strange to me.

According to my experience, GitHub still makes failure have higher priority. It is what I have seen for the latest 1.24 branch:

  1. One job fails, others are running
  2. The commit status shows a red cross (failure)
  3. I can click into the commit and see its jobs, others are still running, I can cancel the running ones.

I suspect there are some problems in this PR or the whole design (ps: I don't really use Actions so I can't comment more or spend more time on it)

@lunny
Copy link
Member

lunny commented Jun 24, 2025

more that github does it weirdly in this case.

Hmm, it's also strange to me.

According to my experience, GitHub still makes failure have higher priority. It is what I have seen for the latest 1.24 branch:

  1. One job fails, others are running
  2. The commit status shows a red cross (failure)
  3. I can click into the commit and see its jobs, others are still running, I can cancel the running ones.

I suspect there are some problems in this PR or the whole design (ps: I don't really use Actions so I can't comment more or spend more time on it)

I think other CI/CDs should have the similar problems?

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 24, 2025

My brief understanding of the problem is (just my guess, some of my previous comments might be not right):

GitHub has 2 kinds of "aggregation logic"

  • The commit status for commit list: failure has higher priority
  • The action status for action details page: running has higher priority

If I understand correctly, Gitea also has 2 kinds of "aggregation logic", the same to GitHub.

  • The commit status for commit list: (need to check code's logic to confirm)
  • The action status for action details page: this PR, matches GitHub now

So, overall this change should be fine.

zjjhot added a commit to zjjhot/gitea that referenced this pull request Jun 26, 2025
* giteaofficial/main:
  [skip ci] Updated translations via Crowdin
  Add issue delete notifier (go-gitea#34592)
  Refactor "change file" API (go-gitea#34855)
  Fix some log and UI problems (go-gitea#34863)
  Update go tool dependencies (go-gitea#34845)
  Fix archive API (go-gitea#34853)
  Update `uint8-to-base64`, remove type stub (go-gitea#34844)
  Refactor repo contents API and add "contents-ext" API (go-gitea#34822)
  [skip ci] Updated translations via Crowdin
  fix(issue): Replace stopwatch toggle with explicit start/stop actions (go-gitea#34818)
  Remove unused variable HUGO_VERSION (go-gitea#34840)
  Fix SSH LFS timeout (go-gitea#34838)
  Ignore force pushes for changed files in a PR review (go-gitea#34837)
  Fix log fmt (go-gitea#34810)
  Fix team permissions (go-gitea#34827)
  Fix job status aggregation logic (go-gitea#34823)
  [skip ci] Updated translations via Crowdin
  correct migration tab name (go-gitea#34826)
  Refactor template helper (go-gitea#34819)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/done All backports for this PR have been created backport/v1.24 This PR should be backported to Gitea 1.24 lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. modifies/go Pull requests that update Go code type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Action behavior inconsist with Github Action when first job fail
5 participants