Switch the deployment strategy based on external condition (PV type) #15168

marekjelen · 2017-07-12T16:31:54Z

Rolling strategy is not useful for for deployments with RWO PVs.

Version

oc v1.5.1+7b451fc
kubernetes v1.5.2+43a9be4
features: Basic-Auth

Steps To Reproduce

Create RWO PV
Assign the PV to a deployment with Rolling strategy

Current Result

When new deployment gets triggered, the deployment gets stuck.

Expected Result

The deployment strategy could be switched to Recreate to safe the user from the need to figure out the problem and then changing the strategy manually.

Additional Information

N/A

The text was updated successfully, but these errors were encountered:

mfojtik · 2017-07-13T10:54:24Z

@smarterclayton is it reasonable to emit a warning (event/condition/etc) saying that rolling with RWO will fail to roll? I don't think we should decide the strategy for the user automatically based on "external" inputs (like PVC type).

Also we can maybe fail the rollout before we actually create the deployer pod when we know in advance the rollout will fail (rolling + rwo).

@Kargakis @tnozicka FYI

0xmichalis · 2017-07-13T13:55:41Z

Agreed with @mfojtik - we already do a lot of magic with triggers in the spec. I thought oc status would already emmit a warning for rolling deployments with RWO volumes, @marekjelen isn't that the case?

mfojtik · 2017-07-13T13:57:09Z

@Kargakis how about web console? //cc @jwforres

0xmichalis · 2017-07-13T13:58:46Z

@mfojtik you meant to ask @jwforres @spadgett ;)

mfojtik · 2017-07-13T13:59:25Z

@Kargakis i corrected myself ;P

jwforres · 2017-07-13T14:09:02Z

I don't think the console is showing a special warning for this today, but sounds like something to consider if we know its always going to fail.

jwforres · 2017-07-13T14:10:33Z

The problem from the perspective of the Overview is that we don't get PVC details at all today. PVCs are relatively stable, might be something we could just list, or slow poll. @spadgett probably other things we could be showing relative to PVCs used by deployments, like this deployment config references PVCs that are not bound?

mfojtik · 2017-07-13T14:15:13Z

@jwforres as far as i remember when the RWO volumes are bound to a DC with rolling strategy we fail but the error is hidden in events and it is not really clear ;-) (you get some nasty storage error)...

Maybe time for:

:-) "Looks like you have RWO volume with rolling strategy, do you want to change it?"

smarterclayton · 2017-07-13T14:43:47Z

I don't know that it's a warning necessary - it's totally valid to do this for a deployment. In fact, this is the correct way on openshift today to do a DB at scale 1 on AWS or gce. So warning is a bit much. *But*, it's probably something we should "inform" them of if they have scale > 1, and they'd might be better off with recreate for scale 1 (the advantage of rolling is that the new pod will complete the pull prior to the old pod going down)

…

On Thu, Jul 13, 2017 at 10:15 AM, Michal Fojtik ***@***.***> wrote: @jwforres <https://github.com/jwforres> as far as i remember when the RWO volumes are bound to a DC with rolling strategy we fail but the error is hidden in events and it is not really clear ;-) (you get some nasty storage error)... Maybe time for: [image: gsmarena_001] <https://user-images.githubusercontent.com/44136/28170692-6ede73da-67e6-11e7-980d-8669c925065c.jpg> :-) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15168 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p7akE3ycPOcwt_LeuAVmuaZDrgXxks5sNiZzgaJpZM4OV49y> .

marekjelen · 2017-07-13T15:00:07Z

@smarterclayton if I have RWO PV and at the same time Rolling, the deployment gets always stuck, even with replicas=1. E.g. in online we do for persistent DBs with Recreate strategy by default, and so I went to Online Starter and took these screenshots after switching from Recreate to Rolling.

Amazon EBS and GCE based PVs only allow RWO mode and so if you set Rolling on a database deployment with a PV with these technologies you will never be able to trigger new deployment.

smarterclayton · 2017-07-13T15:19:39Z

Something else is wrong, that's not how the system should behave. Rolling deployment marks the old pod as deleted, which allows the cluster to detach the volume. You're likely hitting a bug you should be reporting to @bchilds

…

On Thu, Jul 13, 2017 at 11:00 AM, Marek Jelen ***@***.***> wrote: @smarterclayton <https://github.com/smarterclayton> if I have RWO PV and at the same time Rolling, the deployment gets always stuck, even with replicas=1. E.g. in online we do for persistent DBs with Recreate strategy by default, and so I went to Online Starter and took these screenshots after switching from Recreate to Rolling. [image: screen shot 2017-07-13 at 16 49 23] <https://user-images.githubusercontent.com/156068/28172385-b446e7f4-67eb-11e7-8c7a-afd99d40480f.png> [image: screen shot 2017-07-13 at 16 49 38] <https://user-images.githubusercontent.com/156068/28172393-b93d721e-67eb-11e7-8fdb-824e978e7e67.png> [image: screen shot 2017-07-13 at 16 55 48] <https://user-images.githubusercontent.com/156068/28172557-27145ad2-67ec-11e7-96e9-b36f3069ae55.png> [image: screen shot 2017-07-13 at 16 56 02] <https://user-images.githubusercontent.com/156068/28172561-2b0fc0e0-67ec-11e7-9df5-91f11855d6fe.png> [image: screen shot 2017-07-13 at 16 59 46] <https://user-images.githubusercontent.com/156068/28172720-a7f12db0-67ec-11e7-9b6a-95360fff7ddb.png> Amazon EBS and GCE based PVs only allow RWO mode and so if you set Rolling on a database deployment with a PV with these technologies you will never be able to trigger new deployment. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15168 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p32IjEeH7IFTiI-mApzpnb9nGboZks5sNjD8gaJpZM4OV49y> .

marekjelen · 2017-07-13T15:34:36Z

@smarterclayton that is interesting :) During rolling strategy there has to be two pods (for replicas=1), these two pods are with high probability running on two different machines, RWO can be attached to only one pod, usually the underlaying tech can be attached to only one machine. When I trigger redeploy and it would behave as you describe, I will will loose the PV from the original pod, however the application in that pod is not aware of that and can write into the PV, that should be there, but is not, as per your description is detached from the pod.

If Rollingis used with RWO volume I have to run into at least one of these two scenarios

my original pod does not have PV anymore and so any writes into that PV are inconsistent, however the app is not aware of that
two pods need to write to single PV that is not designed for multiple concurrent writes, which could lead into FS/storage corruption

jorgemoralespou · 2017-07-13T15:44:15Z

Something else is wrong, that's not how the system should behave. Rolling
deployment marks the old pod as deleted, which allows the cluster to detach
the volume. You're likely hitting a bug you should be reporting to @bchilds

@smarterclayton when is the old pod marked as deleted? AFAIU until there new version is not live and ready we can not mark the old pod as deleted (and detach the persistent storage) as it will still receive traffic, since the endpoint will be listed in the service. Once the new pod is ready, the old pod is marked as terminating, and the endpoint is removed from the service, but we can not still detach the storage since we need to wait for the graceful shutdown, else we could be introducing a lot of application errors. And I hope we're not.

marekjelen · 2017-07-20T09:43:35Z

@smarterclayton can you please follow up on the issue? thanks

0xmichalis · 2017-07-20T09:48:22Z

It's unlike that we will automate any sort of spec mutation to handle this case. oc status should already warn in case you are running a Rolling deployment with a RWO volume. The only thing missing is a console warning?

mfojtik · 2017-07-20T09:49:05Z

@Kargakis yes

marekjelen · 2017-07-20T10:39:44Z

@Kargakis @mfojtik could the warning also be shown directly in oc deploy/rollout instead of being hidden in the oc status ?

Plus would like to get some clarification on what @smarterclayton says regarding the behaviour of RWO volumes, that is still confusing to me and I am not the only one who thinks the behaviour is supposed to be different then what @smarterclayton says.

openshift-bot · 2018-02-15T04:46:40Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2018-03-18T05:12:53Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

jorgemoralespou · 2018-03-21T14:06:10Z

/lifecycle frozen

mfojtik assigned tnozicka Jul 13, 2017

mfojtik added area/usability component/apps priority/P3 labels Jul 13, 2017

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 15, 2018

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 18, 2018

openshift-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Mar 21, 2018

tnozicka removed their assignment Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch the deployment strategy based on external condition (PV type) #15168

Switch the deployment strategy based on external condition (PV type) #15168

marekjelen commented Jul 12, 2017

mfojtik commented Jul 13, 2017

0xmichalis commented Jul 13, 2017

mfojtik commented Jul 13, 2017 •

edited

Loading

0xmichalis commented Jul 13, 2017

mfojtik commented Jul 13, 2017

jwforres commented Jul 13, 2017

jwforres commented Jul 13, 2017

mfojtik commented Jul 13, 2017 •

edited

Loading

smarterclayton commented Jul 13, 2017 via email

marekjelen commented Jul 13, 2017

smarterclayton commented Jul 13, 2017 via email

marekjelen commented Jul 13, 2017 •

edited

Loading

jorgemoralespou commented Jul 13, 2017

marekjelen commented Jul 20, 2017

0xmichalis commented Jul 20, 2017

mfojtik commented Jul 20, 2017

marekjelen commented Jul 20, 2017

openshift-bot commented Feb 15, 2018

openshift-bot commented Mar 18, 2018

jorgemoralespou commented Mar 21, 2018

Switch the deployment strategy based on external condition (PV type) #15168

Switch the deployment strategy based on external condition (PV type) #15168

Comments

marekjelen commented Jul 12, 2017

Version

Steps To Reproduce

Current Result

Expected Result

Additional Information

mfojtik commented Jul 13, 2017

0xmichalis commented Jul 13, 2017

mfojtik commented Jul 13, 2017 • edited Loading

0xmichalis commented Jul 13, 2017

mfojtik commented Jul 13, 2017

jwforres commented Jul 13, 2017

jwforres commented Jul 13, 2017

mfojtik commented Jul 13, 2017 • edited Loading

smarterclayton commented Jul 13, 2017 via email

marekjelen commented Jul 13, 2017

smarterclayton commented Jul 13, 2017 via email

marekjelen commented Jul 13, 2017 • edited Loading

jorgemoralespou commented Jul 13, 2017

marekjelen commented Jul 20, 2017

0xmichalis commented Jul 20, 2017

mfojtik commented Jul 20, 2017

marekjelen commented Jul 20, 2017

openshift-bot commented Feb 15, 2018

openshift-bot commented Mar 18, 2018

jorgemoralespou commented Mar 21, 2018

mfojtik commented Jul 13, 2017 •

edited

Loading

mfojtik commented Jul 13, 2017 •

edited

Loading

marekjelen commented Jul 13, 2017 •

edited

Loading