Skip to content

oxide instance stop -> Instance stopping forever #1126

Open

Description

"I wish I knew how to quit you"

For some context, see: #1124

Basically, when running a looping script of "instance start, stop, destroy" in a loop, after ~500 iterations I saw a hang in the request to instance stop.

From the CLI:

+ oxide instance stop --confirm --organization myorg --project myproject myinstance529                                                                                                                                                        
⡀ Waiting for instance status to be `stopped` 
^C

 $ oxide instance get --organization myorg --project myproject myinstance529
 time_run_state_updated | 14 hours ago                         
 time_modified          | 14 hours ago                         
 time_created           | 14 hours ago                         
 run_state              | stopping                             
 project_id             | 96812850-4387-4b4a-8c15-e7b405d588f5 
 ncpus                  | 2                                    
 memory                 | 1073741824                           
 hostname               | myinstance                           
 description            | description                          
 name                   | myinstance529                        
 id                     | cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb

The Propolis zone still appears to exist.

Propolis logs, within the propolis zone:

$ cat $(svcs -L propolis-server)
May 27 00:08:01.536 INFO Calling initialize_virtio_block, uri: /instance, method: PUT, req_id: 687b2b06-85a8-4de1-b709-8b1d74067b71, remote_addr: [fd00:1122:3344:101::1]:54316, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.537 INFO cloud-init disk created, uri: /instance, method: PUT, req_id: 687b2b06-85a8-4de1-b709-8b1d74067b71, remote_addr: [fd00:1122:3344:101::1]:54316, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.537 INFO request completed, response_code: 201, uri: /instance, method: PUT, req_id: 687b2b06-85a8-4de1-b709-8b1d74067b71, remote_addr: [fd00:1122:3344:101::1]:54316, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.537 INFO request completed, response_code: 204, uri: /instance/state, method: PUT, req_id: d1c4f129-6f8c-41b7-bf56-3640841a0ecf, remote_addr: [fd00:1122:3344:101::1]:54316, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.537 INFO Instance transition, state_target: Some(Run), state_prev: Initialize, state: Boot, task: instance-driver, component: vmm
May 27 00:08:01.538 INFO Instance transition, state_target: Some(Run), state_prev: Boot, state: Run, task: instance-driver, component: vmm
May 27 00:08:01.538 INFO request completed, response_code: 200, uri: /instance/state-monitor, method: GET, req_id: 9e7c3427-ca12-4f2a-bc22-91568716372d, remote_addr: [fd00:1122:3344:101::1]:54316, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.636 INFO unhandled PIO, bytes: 4, port: 44800, op: out, sync_task: vcpu-0, component: vmm
May 27 00:08:01.636 INFO unhandled PIO, bytes: 4, port: 44800, op: out, sync_task: vcpu-0, component: vmm
May 27 00:08:01.636 INFO unhandled PIO, bytes: 1, port: 44805, op: out, sync_task: vcpu-0, component: vmm
May 27 00:08:01.636 INFO unhandled PIO, bytes: 4, port: 44800, op: in, sync_task: vcpu-0, component: vmm
May 27 00:08:01.642 INFO accepted connection, remote_addr: [fd00:1122:3344:101::1]:49044, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.642 INFO Instance transition, state_target: Some(Halt), state_prev: Run, state: Quiesce, task: instance-driver, component: vmm
May 27 00:08:01.642 INFO request completed, response_code: 204, uri: /instance/state, method: PUT, req_id: 2d0a2675-d427-41f0-ac56-fd09dc1e7370, remote_addr: [fd00:1122:3344:101::1]:49044, local_addr: [fd00:1122:3344:101::227]:12400

Sled Agent logs:

[2022-05-27T00:08:01.540212385Z]  INFO: SledAgent/InstanceManager/6692 on thelio: Observing new propolis state: Running (instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.540561549Z]  INFO: SledAgent/InstanceManager/6692 on thelio: New state: Running, action: None (instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.555240667Z]  INFO: SledAgent/InstanceManager/6692 on thelio: GET request to http://[fd00:1122:3344:101::227]:12400/instance/state-monitor (propolis_client address=[fd00:1122:3344:101::227]:12400, instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.642352035Z]  INFO: SledAgent/dropshot (SledAgent)/6692 on thelio: accepted connection (local_addr=[fd00:1122:3344:101::1]:12345, remote_addr=[fd00:1122:3344:101::3]:64101)
[2022-05-27T00:08:01.642892056Z]  INFO: SledAgent/InstanceManager/6692 on thelio: instance_ensure cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb -> InstanceRuntimeStateRequested { run_state: Stopped, migration_params: None }
[2022-05-27T00:08:01.643274629Z]  INFO: SledAgent/InstanceManager/6692 on thelio: instance already exists
[2022-05-27T00:08:01.643642965Z]  INFO: SledAgent/InstanceManager/6692 on thelio: transition to InstanceRuntimeStateRequested { run_state: Stopped, migration_params: None }; action: Stop (instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.644092145Z]  INFO: SledAgent/InstanceManager/6692 on thelio: Taking action: Stop (instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.644521605Z]  INFO: SledAgent/InstanceManager/6692 on thelio: PUT request to http://[fd00:1122:3344:101::227]:12400/instance/state (propolis_client address=[fd00:1122:3344:101::227]:12400, instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.645027377Z]  INFO: SledAgent/dropshot (SledAgent)/6692 on thelio: request completed (req_id=776154d6-a921-42f4-8f5d-b4e182f56b59, uri=/instances/cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb, method=PUT, remote_addr=[fd00:1122:3344:101::3]:64101, local_addr=[fd00:1122:3344:101::1]:12345, response_code=200)
[2022-05-27T00:21:03.111197687Z]  INFO: SledAgent/dropshot (SledAgent)/6692 on thelio: accepted connection (local_addr=[fd00:1122:3344:101::1]:12345, remote_addr=[fd00:1122:3344:101::3]:45080)
[2022-05-27T00:21:03.111581481Z]  INFO: SledAgent/InstanceManager/6692 on thelio: instance_ensure cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb -> InstanceRuntimeStateRequested { run_state: Stopped, migration_params: None }
[2022-05-27T00:21:03.111849981Z]  INFO: SledAgent/InstanceManager/6692 on thelio: instance already exists
[2022-05-27T00:21:03.112111437Z]  INFO: SledAgent/dropshot (SledAgent)/6692 on thelio: request completed (req_id=b9aebcf4-d90c-463b-b0d0-5ca21db6e9c3, uri=/instances/cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb, method=PUT, remote_addr=[fd00:1122:3344:101::3]:45080, local_addr=[fd00:1122:3344:101::1]:12345, response_code=200)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Sled AgentRelated to the Per-Sled Configuration and ManagementbugSomething that isn't working.virtualizationPropolis Integration & VM Management

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions