Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Failed analysis to degrade rollout when multiple metrics are analyzed #1535

Merged
merged 2 commits into from
Sep 23, 2021

Conversation

harikrongali
Copy link
Contributor

@harikrongali harikrongali commented Sep 23, 2021

Checklist:
Fixes behavior #1411 although issue is reproducible in 1.0.6 release because of fix #1407

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed my commits with DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

Signed-off-by: hari rongali <hari_rongali@intuit.com>
Signed-off-by: hari rongali <hari_rongali@intuit.com>
@sonarcloud
Copy link

sonarcloud bot commented Sep 23, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@codecov
Copy link

codecov bot commented Sep 23, 2021

Codecov Report

Merging #1535 (647bdb1) into master (d9ba36a) will increase coverage by 0.09%.
The diff coverage is 88.46%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1535      +/-   ##
==========================================
+ Coverage   81.67%   81.77%   +0.09%     
==========================================
  Files         110      112       +2     
  Lines       14798    15071     +273     
==========================================
+ Hits        12086    12324     +238     
- Misses       2078     2103      +25     
- Partials      634      644      +10     
Impacted Files Coverage Δ
metricproviders/graphite/api.go 78.04% <78.04%> (ø)
analysis/analysis.go 84.74% <100.00%> (+1.02%) ⬆️
metricproviders/graphite/graphite.go 100.00% <100.00%> (ø)
utils/defaults/defaults.go 88.02% <0.00%> (-3.64%) ⬇️
rollout/replicaset.go 67.59% <0.00%> (-3.09%) ⬇️
utils/replicaset/replicaset.go 90.04% <0.00%> (-0.83%) ⬇️
rollout/service.go 75.13% <0.00%> (ø)
utils/conditions/conditions.go 80.76% <0.00%> (ø)
rollout/trafficrouting/alb/alb.go 84.86% <0.00%> (ø)
rollout/trafficrouting/smi/smi.go 94.90% <0.00%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c54090...647bdb1. Read the comment docs.

jessesuen
jessesuen previously approved these changes Sep 23, 2021
@jessesuen jessesuen dismissed their stale review September 23, 2021 20:53

have a question

Copy link
Contributor

@alexmt alexmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexmt alexmt merged commit 26433be into argoproj:master Sep 23, 2021
alexmt pushed a commit that referenced this pull request Sep 23, 2021
…lyzed (#1535)

* fix: analysis fail for inline multi metric analysis

Signed-off-by: hari rongali <hari_rongali@intuit.com>

* fix: cleanup

Signed-off-by: hari rongali <hari_rongali@intuit.com>
@jessesuen
Copy link
Member

jessesuen commented Sep 23, 2021

@harikrongali I dismissed my earlier review because I remember there was a reason the code was written this way and I want to make sure we dont break it.

I recall that we did not want to mark the AnalysisRun completed until everything which needed to do was completed. This includes stopping in-flight jobs. Can you verify the following scenario:

  1. start an AnalysisRun with multiple Job metrics, one that sleeps forever, and one that has an initialDelaySeconds of 9999.
  2. terminate the AnalysisRun after the job is created

Will the AnalysisRun stop the in-flight job? I am concerned that by marking it Successful immediately, we will leave the job running forever.

@harikrongali
Copy link
Contributor Author

@jessesuen i will validate asap

@jessesuen
Copy link
Member

Nevermind. I think we are good. This is the scenario I was concerned about:

kind: AnalysisRun
apiVersion: argoproj.io/v1alpha1
metadata:
  generateName: analysis-run-job-
spec:
  metrics:
  - name: dont-finish
    provider:
      job:
        spec:
          template:
            spec:
              containers:
              - name: sleep
                image: nginx:1.19-alpine
                command: [sleep, "999999"]
              restartPolicy: Never
          backoffLimit: 0
  - name: fail
    provider:
      job:
        spec:
          template:
            spec:
              containers:
              - name: sleep
                image: nginx:1.19-alpine
                command: [sh, -c, "sleep 10 && exit 1"]
              restartPolicy: Never
          backoffLimit: 0
  - name: dont-start
    initialDelay: 24h
    provider:
      job:
        spec:
          template:
            spec:
              containers:
              - name: sleep
                image: nginx:1.19-alpine
                command: [sh, -c, "exit 0"]
              restartPolicy: Never
          backoffLimit: 0

I verified nothing is left running when the second metric fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants