Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch the deployment strategy based on external condition (PV type) #15168

Open
marekjelen opened this issue Jul 12, 2017 · 20 comments
Open

Switch the deployment strategy based on external condition (PV type) #15168

marekjelen opened this issue Jul 12, 2017 · 20 comments
Labels
area/usability component/apps lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P3

Comments

@marekjelen
Copy link

Rolling strategy is not useful for for deployments with RWO PVs.

Version
oc v1.5.1+7b451fc
kubernetes v1.5.2+43a9be4
features: Basic-Auth
Steps To Reproduce
  1. Create RWO PV
  2. Assign the PV to a deployment with Rolling strategy
Current Result

When new deployment gets triggered, the deployment gets stuck.

Expected Result

The deployment strategy could be switched to Recreate to safe the user from the need to figure out the problem and then changing the strategy manually.

Additional Information

N/A

@mfojtik
Copy link
Contributor

mfojtik commented Jul 13, 2017

@smarterclayton is it reasonable to emit a warning (event/condition/etc) saying that rolling with RWO will fail to roll? I don't think we should decide the strategy for the user automatically based on "external" inputs (like PVC type).

Also we can maybe fail the rollout before we actually create the deployer pod when we know in advance the rollout will fail (rolling + rwo).

@Kargakis @tnozicka FYI

@0xmichalis
Copy link
Contributor

Agreed with @mfojtik - we already do a lot of magic with triggers in the spec. I thought oc status would already emmit a warning for rolling deployments with RWO volumes, @marekjelen isn't that the case?

@mfojtik
Copy link
Contributor

mfojtik commented Jul 13, 2017

@Kargakis how about web console? //cc @jwforres

@0xmichalis
Copy link
Contributor

@mfojtik you meant to ask @jwforres @spadgett ;)

@mfojtik
Copy link
Contributor

mfojtik commented Jul 13, 2017

@Kargakis i corrected myself ;P

@jwforres
Copy link
Member

I don't think the console is showing a special warning for this today, but sounds like something to consider if we know its always going to fail.

@jwforres
Copy link
Member

The problem from the perspective of the Overview is that we don't get PVC details at all today. PVCs are relatively stable, might be something we could just list, or slow poll. @spadgett probably other things we could be showing relative to PVCs used by deployments, like this deployment config references PVCs that are not bound?

@mfojtik
Copy link
Contributor

mfojtik commented Jul 13, 2017

@jwforres as far as i remember when the RWO volumes are bound to a DC with rolling strategy we fail but the error is hidden in events and it is not really clear ;-) (you get some nasty storage error)...

Maybe time for:
gsmarena_001

:-) "Looks like you have RWO volume with rolling strategy, do you want to change it?"

@smarterclayton
Copy link
Contributor

smarterclayton commented Jul 13, 2017 via email

@marekjelen
Copy link
Author

@smarterclayton if I have RWO PV and at the same time Rolling, the deployment gets always stuck, even with replicas=1. E.g. in online we do for persistent DBs with Recreate strategy by default, and so I went to Online Starter and took these screenshots after switching from Recreate to Rolling.

screen shot 2017-07-13 at 16 49 23

screen shot 2017-07-13 at 16 49 38

screen shot 2017-07-13 at 16 55 48

screen shot 2017-07-13 at 16 56 02

screen shot 2017-07-13 at 16 59 46

Amazon EBS and GCE based PVs only allow RWO mode and so if you set Rolling on a database deployment with a PV with these technologies you will never be able to trigger new deployment.

@smarterclayton
Copy link
Contributor

smarterclayton commented Jul 13, 2017 via email

@marekjelen
Copy link
Author

marekjelen commented Jul 13, 2017

@smarterclayton that is interesting :) During rolling strategy there has to be two pods (for replicas=1), these two pods are with high probability running on two different machines, RWO can be attached to only one pod, usually the underlaying tech can be attached to only one machine. When I trigger redeploy and it would behave as you describe, I will will loose the PV from the original pod, however the application in that pod is not aware of that and can write into the PV, that should be there, but is not, as per your description is detached from the pod.

If Rollingis used with RWO volume I have to run into at least one of these two scenarios

  • my original pod does not have PV anymore and so any writes into that PV are inconsistent, however the app is not aware of that
  • two pods need to write to single PV that is not designed for multiple concurrent writes, which could lead into FS/storage corruption

@jorgemoralespou
Copy link

Something else is wrong, that's not how the system should behave. Rolling
deployment marks the old pod as deleted, which allows the cluster to detach
the volume. You're likely hitting a bug you should be reporting to @bchilds

@smarterclayton when is the old pod marked as deleted? AFAIU until there new version is not live and ready we can not mark the old pod as deleted (and detach the persistent storage) as it will still receive traffic, since the endpoint will be listed in the service. Once the new pod is ready, the old pod is marked as terminating, and the endpoint is removed from the service, but we can not still detach the storage since we need to wait for the graceful shutdown, else we could be introducing a lot of application errors. And I hope we're not.

@marekjelen
Copy link
Author

@smarterclayton can you please follow up on the issue? thanks

@0xmichalis
Copy link
Contributor

It's unlike that we will automate any sort of spec mutation to handle this case. oc status should already warn in case you are running a Rolling deployment with a RWO volume. The only thing missing is a console warning?

@mfojtik
Copy link
Contributor

mfojtik commented Jul 20, 2017

@Kargakis yes

@marekjelen
Copy link
Author

@Kargakis @mfojtik could the warning also be shown directly in oc deploy/rollout instead of being hidden in the oc status ?

Plus would like to get some clarification on what @smarterclayton says regarding the behaviour of RWO volumes, that is still confusing to me and I am not the only one who thinks the behaviour is supposed to be different then what @smarterclayton says.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 15, 2018
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 18, 2018
@jorgemoralespou
Copy link

/lifecycle frozen

@openshift-ci-robot openshift-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Mar 21, 2018
@tnozicka tnozicka removed their assignment Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/usability component/apps lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P3
Projects
None yet
Development

No branches or pull requests

9 participants