-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove unnecessary needUpdate to avoid unexpected skipping updating before cache synced #3994
base: master
Are you sure you want to change the base?
Conversation
…efore cache synced Signed-off-by: lxtywypc <lxtywypc@gmail.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I can't tell how bad it is right now. Please @XiShanYongYe-Chang take a look. |
/assign @XiShanYongYe-Chang |
Ok, I will take a look ASAP. |
Hi @lxtywypc, thank you for your update. I indeed did not consider this point when reviewing it earlier. How did you discover it during testing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell how bad it is right now. Please @XiShanYongYe-Chang take a look.
We don't need to revert the previously merged pull request. The purpose of this modification was originally to reduce the number of work updates, but due to possible delayed cache updates, it could result in an incorrect work status. After removing this logic, the work status can be correctly updated, although there may be some unnecessary update operations.
/lgtm
Ask an again review from @chaunceyjiang
/cc @chaunceyjiang
Due to that our testing environment sometimes need a little long time(for example 1s) for cache syncing. Once we noticed a work failed applied for a long time and tried to figure out what happened, we found that it had been applied successfully after once retry through the log of |
/assign |
Draft for reverting #3808 as comment #3959 (comment) |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Consider if some error occured in
execution-controller
, it would update the condition ofwork
fromtrue
tofalse
, and requeue in a short time(5ms for first time as default now), but during this reconciling, the condition of gotwork
might be stilltrue
because the cache ofwork
might not be synced yet. If no error occurs this time, when callingneedUpdateCondition
, we would find the condition in both "old" and "new"work status
istrue
, that would cause unexpected skipping updatingwork status
with the condition isfalse
in cluster in fact.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
I'm sorry for bringing this in #3808 , I intended to reduce the attempts to update
work
because now there are much more chances to updatework status
than before. But it seems quite a potential risk here. Maybe we could consider about a better way later to reduce the attempts to do unnecessary updating.Does this PR introduce a user-facing change?: