Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRD: allow specification of the desired step to be in #2906

Open
Snaipe opened this issue Jul 25, 2023 · 8 comments
Open

CRD: allow specification of the desired step to be in #2906

Snaipe opened this issue Jul 25, 2023 · 8 comments
Labels
blue-green Blue-Green related issue canary Canary related issue enhancement New feature or request no-issue-activity

Comments

@Snaipe
Copy link

Snaipe commented Jul 25, 2023

Summary

One point of resistance in adopting Argo Rollouts for us is the departure from GitOps to represent the transitive state of a rollout. In other words, there are no good, streamlined way to represent in git which step the Rollout object is supposed to be at, and instead the documentation recommends using kubectl argo rollouts to manage rollouts, which is not version controlled.

Perhaps more importantly, if a rollout pauses for an extended period of time (say 12 hours), then whomever started the deployment will likely not be around or ready when the next step kicks in.

Today, this can be achieved sub-optimally by making use of the paused attribute. For instance, you could define a handful of steps to set the traffic weight, with pauses of 1h in between, start the rollout as paused, and flip-flop .spec.paused between true to resume the rollout whenever someone is back at their desk and false whenever said person walks out. This is of course not great.

It would be an improvement if we could specify which step the rollout is supposed to be at, and have Argo Rollouts reconcile the deployment to be at that step.

This could be implemented as a new step attribute in the rollout spec, which could be an index in the steps list:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  step: 1 # refers to .spec.strategy.canary.steps[1]
  strategy:
    canary:
      steps:
      - setWeight: 20
      - setWeight: 40
      - setWeight: 60
      - setWeight: 80

... or, we could make this more palatable by naming steps and using that:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  step: 40% # still refers to .spec.strategy.canary.steps[1]
  strategy:
    canary:
      steps:
      - setWeight: 20
      - name: 40%
        setWeight: 40
      - setWeight: 60
      - setWeight: 80

Use Cases

This would allow for a pure-gitops flow for managing red-green/canary deployments.

An Example Flow

Let's start from the demo rollout, with only steps modified for the sake of this example.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  replicas: 5
  strategy:
    canary:
      steps:
      - setWeight: 20
      - setWeight: 40
      - setWeight: 60
      - setWeight: 80
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollouts-demo
  template:
    metadata:
      labels:
        app: rollouts-demo
    spec:
      containers:
      - name: rollouts-demo
        image: argoproj/rollouts-demo:blue
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        resources:
          requests:
            memory: 32Mi
            cpu: 5m

Applying this deploys argoproj/rollouts-demo:blue, serving 100% of the traffic.

Now, let's say we want to update this to green, starting with a weight of 40 (i.e. step 1). We would make the following commit:

diff --git a/orig.yaml b/new.yaml
index 74a28c3..82eacb3 100644
--- a/orig.yaml
+++ b/new.yaml
@@ -4,6 +4,7 @@ metadata:
   name: rollouts-demo
 spec:
   replicas: 5
+  step: 1
   strategy:
     canary:
       steps:
@@ -22,7 +23,7 @@ spec:
     spec:
       containers:
       - name: rollouts-demo
-        image: argoproj/rollouts-demo:blue
+        image: argoproj/rollouts-demo:green
         ports:
         - name: http
           containerPort: 8080

Argo Rollouts would reconcile this, start the new deployment, execute steps 0 and 1, and stop.

If we wanted to proceed with the deployment, we'd either set step to 3 (i.e. the last step), or remove step altogether (at which points the steps would get executed to completion).

But let's say we discovered an issue with the green deployment and wanted to roll back. No problem, we can just git revert the commit that introduced this:

diff --git a/new.yaml b/orig.yaml
index 82eacb3..74a28c3 100644
--- a/new.yaml
+++ b/orig.yaml
@@ -4,7 +4,6 @@ metadata:
   name: rollouts-demo
 spec:
   replicas: 5
-  step: 1
   strategy:
     canary:
       steps:
@@ -23,7 +22,7 @@ spec:
     spec:
       containers:
       - name: rollouts-demo
-        image: argoproj/rollouts-demo:green
+        image: argoproj/rollouts-demo:blue
         ports:
         - name: http
           containerPort: 8080

By reverting the template spec back to its stable state, and because the upgrade was not completed, Argo Rollouts does as usual and treats this as a rollback that it expedites.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@Snaipe Snaipe added the enhancement New feature or request label Jul 25, 2023
@zachaller
Copy link
Collaborator

zachaller commented Jul 26, 2023

I am still digesting a lot of this and need to think about it more but I am thinking that instead of step: being thought of as the step to be at. I think having it something like stopAtStep: 2 or pauseAtStep: 2 which is the step to stop/pause at is more clear to the behaviour would you agree?

@kostis-codefresh
Copy link
Member

Two comments here

First of all this is a welcome addition. But we need to remember that Argo Rollouts is not dependent on Argo CD. Several companies use Argo Rollouts on its own and maybe they don't even care about GitOps at all. So we should make sure that the solution selected can work in both cases (GitOps and non-GitOps)

Secondly if we implement such spec, we need to do something similar for blue/green as well.

@kostis-codefresh kostis-codefresh added canary Canary related issue blue-green Blue-Green related issue labels Jul 26, 2023
@Snaipe
Copy link
Author

Snaipe commented Jul 26, 2023

I am still digesting a lot of this and need to think about it more but I am thinking that instead of step: being thought of as the step to be at. I think having it something like stopAtStep: 2 or pauseAtStep: 2 which is the step to stop/pause at is more clear to the behaviour would you agree?

Sure; I think pauseAtStep is more consistent with paused, so I'd pick that over stopAtStep.

First of all this is a welcome addition. But we need to remember that Argo Rollouts is not dependent on Argo CD. Several companies use Argo Rollouts on its own and maybe they don't even care about GitOps at all. So we should make sure that the solution selected can work in both cases (GitOps and non-GitOps)

Right -- we've been happily using ArgoCD for the past year, and I've been looking on and off to use Argo Rollouts to manage our rollouts, but I've received concerns about this so I thought I'd open the discussion. I think the proposal I have is fully compatible with a non-gitops workflow, and might in fact be useful to fine-control the progression of a tricky rollout.

Secondly if we implement such spec, we need to do something similar for blue/green as well.

Yes, this was my intention; the example only showed canary but this would apply to blue/green too.

@zachaller
Copy link
Collaborator

@Snaipe could you explain the blue green side because blue green has no steps, I only saw this as use for canary and was going to suggest even moving the pauseAtStep down inside of the canary field. Could you explain a bit on how this would work with blue green or what you would want to control via git?

@kostis-codefresh
Copy link
Member

kostis-codefresh commented Jul 26, 2023

Blue/Green should have a single switch like promoted=yes/no or something similar to decide if blue/green is in the initial phase (preview doesn't get any traffic at all) or promoted one (new color gets all the traffic)

At least this is what I imagine...

All this of course assuming that autoPromotionEnabled=false

@Snaipe
Copy link
Author

Snaipe commented Jul 26, 2023

Yeah, I was about to propose that blue/green could get paused at various points (pauseAtStep: preview, pauseAtStep: promote, and pauseAtStep: scaleDown) but I'm not so sure whether I like using the "step" denomination for this.

The way this would have worked is pauseAtStep: preview would cause the rollout to stop after the prePromotionAnalysis runs but before the promotion (i.e. same behaviour as autoPromotionEnabled=false), pauseAtStep: promote would have paused the rollout after the promotion and postPromotionAnalysis but before the scaleDown; and pauseAtStep: scaleDown would run the rollout to completion.

I'm not sure whether it's advisable to invent fake "step" names for blue/green, or use the step terminology altogether.

It also feels like blue/green today is somewhat easier to manage via git-ops by using autoPromotionEnabled and flip-flopping the value between false and true. At that stage this feels like a separate promoted field would be redundant with autoPromotionEnabled.

Thoughts?

@zachaller
Copy link
Collaborator

Yea I agree for blue green I would probably just use pauseAt: the example below shows both cases although rollouts only allows canary or bluegreen of course. I also think that these would not have to be implemented together but it would be nice to maintain some similarity but I see it as two seperate PR's/Issues/Proposals etc etc. I also am not sure I really love the blue green side of it yet but as I think about it more that could change not opposed I just see less value as well.

spec:
  step: 1 # refers to .spec.strategy.canary.steps[1]
  strategy:
    canary:
      pauseAtStep:
    blueGreen:
      pauseAt:

@github-actions
Copy link
Contributor

This issue is stale because it has been open 60 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blue-green Blue-Green related issue canary Canary related issue enhancement New feature or request no-issue-activity
Projects
None yet
Development

No branches or pull requests

3 participants