Open
Description
When following the remote-access preview instructions, I've noticed flakiness creating instances with the terraform workflow.
I've been following the demo instructions on folgers, and when I get to the point of creating instances via terraform, I use: terraform init && terraform apply
. For most instances, this seems to work, but I occasionally see one or two which fail with a "500 internal error".
Digging into the sled agent logs, I see the following:
[2022-09-07T15:34:00.571064026Z] INFO: SledAgent/dropshot (SledAgent)/12362 on folgers: accepted connection (local_addr=[fd00:1122:3344:101::1]:12345, remote_addr=[fd00:1122:3344:101::3]:57942)
[2022-09-07T15:34:00.571380893Z] INFO: SledAgent/InstanceManager/12362 on folgers: instance_ensure e7670ef9-73ec-4846-8058-f84a263e2ef9 -> InstanceRuntimeStateRequested { run_state: Running, migration_params: None }
[2022-09-07T15:34:00.571637528Z] INFO: SledAgent/InstanceManager/12362 on folgers: new instance
[2022-09-07T15:34:00.572034446Z] INFO: SledAgent/InstanceManager/12362 on folgers: Instance::new w/initial HW: InstanceHardware { runtime: InstanceRuntimeState { run_state: Creating, sled_id: fb0f7546-4d46-40ca-9d56-cbb810684ca7, propoli
s_id: e685ef90-d155-4bcd-abf3-12531e6c1ef4, dst_propolis_id: None, propolis_addr: Some([fd00:1122:3344:101::e]:12400), migration_id: None, ncpus: InstanceCpuCount(4), memory: ByteCount(2147483648), hostname: "db-instance-1", ...
[2022-09-07T15:34:00.572289002Z] INFO: SledAgent/dropshot (SledAgent)/12362 on folgers: request completed (req_id=4b7b08f2-c1b6-4574-810f-f540ee5715c9, uri=/instances/e7670ef9-73ec-4846-8058-f84a263e2ef9, method=PUT, remote_addr=[fd00:11
22:3344:101::3]:57942, local_addr=[fd00:1122:3344:101::1]:12345, error_message_external="Internal Server Error", response_code=500)
error_message_internal: Error managing instances: Instance error: Failure interacting with the OPTE ioctl(2) interface: netadm failed dlmgmtd: link id creation failed: 17
I'm using Omicron @ 55cc15c