Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ability to abort an update when exceeding progressDeadlineSeconds #1397

Merged
merged 7 commits into from
Aug 23, 2021

Conversation

huikang
Copy link
Member

@huikang huikang commented Aug 5, 2021

  • Add AbortExceedProgressDeadline to rollout spec
  • when rollout doesn't use analysis, set AbortExceedProgressDeadline
    to true to scale down the new RS.
  • e2e test added

Signed-off-by: Hui Kang hui.kang@salesforce.com

close #1376
Partial fix: #1295

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed my commits with DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

@codecov
Copy link

codecov bot commented Aug 5, 2021

Codecov Report

Merging #1397 (0cfdf29) into master (86107de) will increase coverage by 0.08%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1397      +/-   ##
==========================================
+ Coverage   81.36%   81.45%   +0.08%     
==========================================
  Files         108      108              
  Lines       10037    10051      +14     
==========================================
+ Hits         8167     8187      +20     
+ Misses       1313     1303      -10     
- Partials      557      561       +4     
Impacted Files Coverage Δ
rollout/sync.go 78.16% <100.00%> (+1.53%) ⬆️
utils/conditions/conditions.go 78.94% <100.00%> (ø)
analysis/analysis.go 85.87% <0.00%> (-0.68%) ⬇️
pkg/kubectl-argo-rollouts/cmd/create/create.go 65.58% <0.00%> (+0.16%) ⬆️
controller/metrics/rollouts.go 77.58% <0.00%> (+5.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86107de...0cfdf29. Read the comment docs.

pkg/apis/rollouts/v1alpha1/types.go Outdated Show resolved Hide resolved
rollout/sync.go Outdated Show resolved Hide resolved
utils/defaults/defaults.go Outdated Show resolved Hide resolved
@huikang huikang force-pushed the 1376-abort-option-update branch 2 times, most recently from d7dd76f to e274da7 Compare August 6, 2021 03:30
rollout/sync.go Outdated
Comment on lines 664 to 666
condition = conditions.NewRolloutCondition(v1alpha1.RolloutPaused, corev1.ConditionTrue, conditions.TimedOutReason, msg)
} else {
condition = conditions.NewRolloutCondition(v1alpha1.RolloutProgressing, corev1.ConditionFalse, conditions.TimedOutReason, msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't think it's correct to conditionally add the Progressing=False. If we get to line 652 case block, we should always set Progressing to be false.

  2. It doesnt look like we will emit an event about the abort, like we do in line 607. Can we structure the code such that if we abort here, we also emit the K8s event about the abort?

  3. The setting of the condition condition = conditions.NewRolloutCondition(v1alpha1.RolloutPaused, corev1.ConditionTrue, conditions.TimedOutReason, msg) seems incorrect. Why would we be considered paused?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I was mistaken about the Paused and Progressing state. Please checkout the updated PR.

Hui Kang added 2 commits August 12, 2021 12:53
- Add AbortExceedProgressDeadline to rollout spec
- when rollout doesn't use analysis, set AbortExceedProgressDeadline
  to true to scale down the new RS.
- e2e test added

Signed-off-by: Hui Kang <hui.kang@salesforce.com>
Signed-off-by: Hui Kang <hui.kang@salesforce.com>
Signed-off-by: Hui Kang <hui.kang@salesforce.com>
rollout/sync.go Outdated
// When ProgressDeadlineAbort is set, abort the update
if c.rollout.Spec.ProgressDeadlineAbort {
c.pauseContext.AddAbort(msg)
c.recorder.Warnf(c.rollout, record.EventOptions{EventReason: conditions.RolloutAbortedReason}, msg)
Copy link
Member

@jessesuen jessesuen Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will cause the Abort event to get emitted every reconciliation. We need to only have it emit the Abort event when there is a state change. I think this needs to be moved lower to the SetRolloutCondition and predicated by if we changed the condition, and c.rollout.Spec.ProgressDeadlineAbort, then we emit the event.

if conditions.SetRolloutCondition(&newStatus, *condition) && c.rollout.Spec.ProgressDeadlineAbort {
    c.recorder.Warnf(c.rollout, record.EventOptions{EventReason: conditions.RolloutAbortedReason}, msg)
}

Signed-off-by: Hui Kang <hui.kang@salesforce.com>
@agill17
Copy link

agill17 commented Aug 16, 2021

So I have a question, maybe I am misunderstanding the changes/docs in this PR. Would this change allow us to abort an update even if we are using analysisTemplate?
Lets say a new canary starts ( the rollout obj does have analysisTemplate(s) ) — but the new RS never becomes healthy because the probe keeps on failing… Does ProgressDeadlineAbort: true apply in cases like this ( were probe is failing, which eventually hits progressDeadlineSeconds, but analysisTemplates are also part of it ? )

@jessesuen
Copy link
Member

Yes, using this flag the rollout will now abort if either one of these conditions become true:

  1. analysis template fails
  2. progress deadline exceeded

The second condition is turned on or off with progressDeadlineAbort: true. It allows a Rollout to abort/rollback to stable even if analysis is not used.

Co-authored-by: Kyle Cronin <cronik@users.noreply.github.com>

Signed-off-by: Hui Kang <hui.kang@salesforce.com>
@jessesuen
Copy link
Member

@huikang the logic looks good, but could you cover the new code with unit tests?

@huikang
Copy link
Member Author

huikang commented Aug 20, 2021

@huikang the logic looks good, but could you cover the new code with unit tests?

Hi, @jessesuen , sure, will add a new test case.

Signed-off-by: Hui Kang <hui.kang@salesforce.com>
Signed-off-by: Hui Kang <hui.kang@salesforce.com>
@sonarcloud
Copy link

sonarcloud bot commented Aug 22, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@huikang
Copy link
Member Author

huikang commented Aug 23, 2021

@huikang the logic looks good, but could you cover the new code with unit tests?

Hi, @jessesuen , the unit tests have been added. Please take another look. Thanks.

@jessesuen jessesuen changed the title feat: scaledown rs when update exceeds progressDeadlineSeconds feat: ability to abort an update when exceeding progressDeadlineSeconds Aug 23, 2021
@jessesuen jessesuen merged commit 17a2158 into argoproj:master Aug 23, 2021
@jessesuen
Copy link
Member

Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to abort when exceeding progressDeadlineSeconds Rollout incorrectly manages 3 replicasets
4 participants