Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnalysisRun Job is terminated when Rollout resource is scaled #3828

Open
2 tasks done
davcd opened this issue Sep 10, 2024 · 2 comments
Open
2 tasks done

AnalysisRun Job is terminated when Rollout resource is scaled #3828

davcd opened this issue Sep 10, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@davcd
Copy link

davcd commented Sep 10, 2024

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

Having a Rollout with an Analysis. If Rollout is scaled during the running of the Analysis, the AnalysisRun Job is terminated. Additionally, the Rollout is promoted even when the Analysis has not finished.

To Reproduce

Having the following Rollout resource:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-rollout
  ...
spec:
  replicas: 3
  ...
  strategy:
    blueGreen:
      activeService: my-service
      previewService: my-service-preview
      prePromotionAnalysis:
        templates:
          - templateName: some-analysis-maybe-e2e

And scaling manually with the following command:

kubectl scale rollout my-rollout --replicas=6

The bug is replicated when the following events occurs:

  • A new Rollout starts

    • Preview ReplicaSet is created and Pods are booted
    • Pre promotion AnalysisRun Job starts
  • Rollout is scaled (triggered by HPA or manually, doesn't matter. Also it can be scaled up or down)

    • Both preview and active ReplicaSets are scaled correctly
    • Analysis Job is terminated (while it was still running)
    • Analysis is marked as successful and Rollout is promoted

Expected behaviour

Running AnalysisRun Jobs are not affected when Rollout resource is scaled.

Version
v1.7.2+59e5bd3


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@davcd davcd added the bug Something isn't working label Sep 10, 2024
@davcd davcd changed the title HPA targeting a Rollout scales down AnalysisRun Job generated pods AnalysisRun Job is terminated when a Scale Down event occurs Nov 14, 2024
@davcd davcd changed the title AnalysisRun Job is terminated when a Scale Down event occurs AnalysisRun Job is terminated when Rollout resource is scaled Nov 14, 2024
@davcd
Copy link
Author

davcd commented Nov 14, 2024

After further investigation, it is possible to conclude that the issue is not related to KEDA/HPA.
Steps to reproduce and bug explanations have been simplified.

@davcd
Copy link
Author

davcd commented Nov 14, 2024

IMO this bug is critical as some people can be bypassing their application promotion conditions.

As an example, if you have some e2e or contract testing taking place before your promotion and your Rollout is scaled at the moment those Analysis are running, you will promote your new version even when the tests are not completed successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant