Skip to content

Recovering from deployment pod that failed without reporting state #2370

@sosiouxme

Description

@sosiouxme

I deployed a router, it was supposed to be broken but not quite like this.

$ osc get pods
POD                    IP        CONTAINER(S)   IMAGE(S)                           HOST                             LABELS    STATUS       CREATED     MESSAGE
deploy-router-1wlv0i                                                               ip-10-51-163-209/10.51.163.209   <none>    Failed       2 minutes   
                                 deployment     openshift/origin-deployer:v0.5.1                                              Terminated   2 minutes   exit code 255
$ docker ps -a
CONTAINER ID        IMAGE                              COMMAND                CREATED             STATUS                       PORTS               NAMES
db929a127597        openshift/origin-deployer:v0.5.1   "/usr/bin/openshift-   2 minutes ago       Exited (255) 2 minutes ago                       k8s_deployment.3ea4efa8_deploy-router-1wlv0i_default_dff46974-fea2-11e4-95d6-22000b3280a3_e3f01fb3   
23992ab347b7        openshift/origin-pod:v0.5.1        "/pod"                 2 minutes ago       Exited (0) 2 minutes ago                         k8s_POD.d6d6c430_deploy-router-1wlv0i_default_dff46974-fea2-11e4-95d6-22000b3280a3_2f922d59          
a98bcb10a0b5        openshift/origin-release:latest    "/bin/sh -c 'tar mxz   23 hours ago        Exited (0) 23 hours ago                          naughty_albattani                                                                                    
$ docker logs db929a127597
F0520 03:47:26.039993       1 deployer.go:66] Failed to decode DeploymentConfig from controller: couldn't get version/kind: unexpected end of JSON input

Alright, not very helpful, but the point is it's broken. What do I do now?

$ osc deploy router
router #1 deployment pending on update. Run osc deploy --latest to deploy now.

$ osc deploy router --latest
Error: #1 is already in progress ()
$ osc get deployment
NAME      STATUS    CAUSE

It's not pending, it's failed. I should be able to just run another one.

Other than delete everything I honestly don't know how to proceed. Could there be some better directions on this?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions