Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(TaskRun): fixed the issue where some step statuses might not be correctly updated in failed TaskRun #8270

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

l-qing
Copy link
Contributor

@l-qing l-qing commented Sep 17, 2024

Previously, if the Pod cache is not updated in time, a TaskRun would be immediately marked as failed after a certain step fails. When the pod status changes to failed, the TaskRun at that time still has a completed status and will not enter the reconcile logic to update the status of each step.

Currently, TaskRun will only fail prematurely in the event of an OOM. If a specific step exits abnormally, it will wait for the Pod status to ultimately change to failed before synchronizing the status of each step.


At the same time, it also resolved the previous issue of unstable integration tests.

#8236 (comment)
The instability in this integration test is related to another PR #8171.

Changes

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • pre-commit Passed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

fix: fixed the issue where some step statuses might not be correctly updated in failed TaskRun

/kind bug

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 17, 2024
@tekton-robot
Copy link
Collaborator

Hi @l-qing. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.2% 92.3% 0.2

Copy link
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@tekton-robot tekton-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 17, 2024
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.2% 92.3% 0.2

@l-qing l-qing force-pushed the fix/taskrun-step-status-update branch from bdc94d1 to 5d66369 Compare September 17, 2024 12:30
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.2% 92.3% 0.2

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.2% 92.3% 0.2

@l-qing l-qing force-pushed the fix/taskrun-step-status-update branch from 5d66369 to 4ff99af Compare September 19, 2024 13:50
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.2% 92.3% 0.2

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.2% 92.3% 0.2

@l-qing
Copy link
Contributor Author

l-qing commented Sep 19, 2024

/retest

@afrittoli afrittoli added this to the Pipeline v0.64 milestone Sep 23, 2024
Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 24, 2024
…orrectly updated in failed TaskRun

Previously, if the Pod cache is not updated in time, a `TaskRun` would be
immediately marked as failed after a certain step fails. When the pod status
changes to failed, the TaskRun at that time still has a completed status and
will not enter the reconcile logic to update the status of each step.

Currently, TaskRun will only fail prematurely in the event of an OOM.
If a specific step exits abnormally, it will wait for the Pod status to
ultimately change to failed before synchronizing the status of each step.
@l-qing l-qing force-pushed the fix/taskrun-step-status-update branch from 4ff99af to b21b5b9 Compare September 25, 2024 08:32
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.2% 92.3% 0.2

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.2% 92.3% 0.2

@chitrangpatel
Copy link
Contributor

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 26, 2024
@l-qing
Copy link
Contributor Author

l-qing commented Sep 26, 2024

/retest

@tekton-robot tekton-robot merged commit cf4ccee into tektoncd:main Sep 26, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants