Open
Description
openedon May 27, 2022
"I wish I knew how to quit you"
For some context, see: #1124
Basically, when running a looping script of "instance start, stop, destroy" in a loop, after ~500 iterations I saw a hang in the request to instance stop
.
From the CLI:
+ oxide instance stop --confirm --organization myorg --project myproject myinstance529
⡀ Waiting for instance status to be `stopped`
^C
$ oxide instance get --organization myorg --project myproject myinstance529
time_run_state_updated | 14 hours ago
time_modified | 14 hours ago
time_created | 14 hours ago
run_state | stopping
project_id | 96812850-4387-4b4a-8c15-e7b405d588f5
ncpus | 2
memory | 1073741824
hostname | myinstance
description | description
name | myinstance529
id | cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb
The Propolis zone still appears to exist.
Propolis logs, within the propolis zone:
$ cat $(svcs -L propolis-server)
May 27 00:08:01.536 INFO Calling initialize_virtio_block, uri: /instance, method: PUT, req_id: 687b2b06-85a8-4de1-b709-8b1d74067b71, remote_addr: [fd00:1122:3344:101::1]:54316, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.537 INFO cloud-init disk created, uri: /instance, method: PUT, req_id: 687b2b06-85a8-4de1-b709-8b1d74067b71, remote_addr: [fd00:1122:3344:101::1]:54316, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.537 INFO request completed, response_code: 201, uri: /instance, method: PUT, req_id: 687b2b06-85a8-4de1-b709-8b1d74067b71, remote_addr: [fd00:1122:3344:101::1]:54316, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.537 INFO request completed, response_code: 204, uri: /instance/state, method: PUT, req_id: d1c4f129-6f8c-41b7-bf56-3640841a0ecf, remote_addr: [fd00:1122:3344:101::1]:54316, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.537 INFO Instance transition, state_target: Some(Run), state_prev: Initialize, state: Boot, task: instance-driver, component: vmm
May 27 00:08:01.538 INFO Instance transition, state_target: Some(Run), state_prev: Boot, state: Run, task: instance-driver, component: vmm
May 27 00:08:01.538 INFO request completed, response_code: 200, uri: /instance/state-monitor, method: GET, req_id: 9e7c3427-ca12-4f2a-bc22-91568716372d, remote_addr: [fd00:1122:3344:101::1]:54316, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.636 INFO unhandled PIO, bytes: 4, port: 44800, op: out, sync_task: vcpu-0, component: vmm
May 27 00:08:01.636 INFO unhandled PIO, bytes: 4, port: 44800, op: out, sync_task: vcpu-0, component: vmm
May 27 00:08:01.636 INFO unhandled PIO, bytes: 1, port: 44805, op: out, sync_task: vcpu-0, component: vmm
May 27 00:08:01.636 INFO unhandled PIO, bytes: 4, port: 44800, op: in, sync_task: vcpu-0, component: vmm
May 27 00:08:01.642 INFO accepted connection, remote_addr: [fd00:1122:3344:101::1]:49044, local_addr: [fd00:1122:3344:101::227]:12400
May 27 00:08:01.642 INFO Instance transition, state_target: Some(Halt), state_prev: Run, state: Quiesce, task: instance-driver, component: vmm
May 27 00:08:01.642 INFO request completed, response_code: 204, uri: /instance/state, method: PUT, req_id: 2d0a2675-d427-41f0-ac56-fd09dc1e7370, remote_addr: [fd00:1122:3344:101::1]:49044, local_addr: [fd00:1122:3344:101::227]:12400
Sled Agent logs:
[2022-05-27T00:08:01.540212385Z] INFO: SledAgent/InstanceManager/6692 on thelio: Observing new propolis state: Running (instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.540561549Z] INFO: SledAgent/InstanceManager/6692 on thelio: New state: Running, action: None (instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.555240667Z] INFO: SledAgent/InstanceManager/6692 on thelio: GET request to http://[fd00:1122:3344:101::227]:12400/instance/state-monitor (propolis_client address=[fd00:1122:3344:101::227]:12400, instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.642352035Z] INFO: SledAgent/dropshot (SledAgent)/6692 on thelio: accepted connection (local_addr=[fd00:1122:3344:101::1]:12345, remote_addr=[fd00:1122:3344:101::3]:64101)
[2022-05-27T00:08:01.642892056Z] INFO: SledAgent/InstanceManager/6692 on thelio: instance_ensure cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb -> InstanceRuntimeStateRequested { run_state: Stopped, migration_params: None }
[2022-05-27T00:08:01.643274629Z] INFO: SledAgent/InstanceManager/6692 on thelio: instance already exists
[2022-05-27T00:08:01.643642965Z] INFO: SledAgent/InstanceManager/6692 on thelio: transition to InstanceRuntimeStateRequested { run_state: Stopped, migration_params: None }; action: Stop (instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.644092145Z] INFO: SledAgent/InstanceManager/6692 on thelio: Taking action: Stop (instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.644521605Z] INFO: SledAgent/InstanceManager/6692 on thelio: PUT request to http://[fd00:1122:3344:101::227]:12400/instance/state (propolis_client address=[fd00:1122:3344:101::227]:12400, instance_id=cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb)
[2022-05-27T00:08:01.645027377Z] INFO: SledAgent/dropshot (SledAgent)/6692 on thelio: request completed (req_id=776154d6-a921-42f4-8f5d-b4e182f56b59, uri=/instances/cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb, method=PUT, remote_addr=[fd00:1122:3344:101::3]:64101, local_addr=[fd00:1122:3344:101::1]:12345, response_code=200)
[2022-05-27T00:21:03.111197687Z] INFO: SledAgent/dropshot (SledAgent)/6692 on thelio: accepted connection (local_addr=[fd00:1122:3344:101::1]:12345, remote_addr=[fd00:1122:3344:101::3]:45080)
[2022-05-27T00:21:03.111581481Z] INFO: SledAgent/InstanceManager/6692 on thelio: instance_ensure cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb -> InstanceRuntimeStateRequested { run_state: Stopped, migration_params: None }
[2022-05-27T00:21:03.111849981Z] INFO: SledAgent/InstanceManager/6692 on thelio: instance already exists
[2022-05-27T00:21:03.112111437Z] INFO: SledAgent/dropshot (SledAgent)/6692 on thelio: request completed (req_id=b9aebcf4-d90c-463b-b0d0-5ca21db6e9c3, uri=/instances/cd7cbd04-5507-4c53-a1e0-1f1e6016e7bb, method=PUT, remote_addr=[fd00:1122:3344:101::3]:45080, local_addr=[fd00:1122:3344:101::1]:12345, response_code=200)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment