Open
Description
openedon May 26, 2022
Running an instance create-stop-delete loop to test #1115, I ran into what appears to be another unexpected failure mode. Nexus assigns an address for the Propolis server to listen on, and the sled-agent dutifully starts up the Propolis zone and contained SMF service listening on that. The sled agent ensures that the Propolis SMF instance comes online, and then verifies that the HTTP server is online by just pinging it. That ping failed, with ECONNREFUSED
, despite the Propolis server apparently listening on that address.
Here's the Propolis SMF log:
bnaecker@feldspar : ~/omicron $ pfexec zlogin $(zoneadm list | tail -1) 'tail +0 -f $(svcs -L propolis-server)'
[ May 26 10:55:27 Enabled. ]
[ May 26 10:55:27 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/propolis-server/bin/propolis-server run /var/svc/manifest/site/propolis-server/config.toml [fd00:1122:3344:101::24]:12400 &"). ]
[ May 26 10:55:27 Method "start" exited with status 0. ]
May 26 17:55:28.937 INFO Starting server...
May 26 17:55:28.937 INFO listening, local_addr: [fd00:1122:3344:101::24]:12400
bnaecker@feldspar : ~/omicron $
So the HTTP server seems to be listening on [fd00:1122:3344:101::24]:12400
. However, the sled agent log shows:
{"msg":"Installing Omicron zone: oxz_propolis-server_61f5d4b2-bfff-4e70-a691-8644e52abcdd","v":0,"name":"SledAgent","level":30,"time":"2022-05-26T17:55:21.878590652Z","hostname":"feldspar","pid":930,"instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager"}
{"msg":"Zone booting","v":0,"name":"SledAgent","level":30,"time":"2022-05-26T17:55:24.214507089Z","hostname":"feldspar","pid":930,"zone":"oxz_propolis-server_61f5d4b2-bfff-4e70-a691-8644e52abcdd","instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager"}
{"msg":"Adding address: Static(V6(Ipv6Network { addr: fd00:1122:3344:101::24, prefix: 64 }))","v":0,"name":"SledAgent","level":30,"time":"2022-05-26T17:55:26.238671328Z","hostname":"feldspar","pid":930,"zone":"oxz_propolis-server_61f5d4b2-bfff-4e70-a691-8644e52abcdd","instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager"}
{"msg":"Created address fd00:1122:3344:101::24/64 for zone: oxz_propolis-server_61f5d4b2-bfff-4e70-a691-8644e52abcdd","v":0,"name":"SledAgent","level":30,"time":"2022-05-26T17:55:26.784988479Z","hostname":"feldspar","pid":930,"instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager"}
{"msg":"Started propolis in zone: oxz_propolis-server_61f5d4b2-bfff-4e70-a691-8644e52abcdd","v":0,"name":"SledAgent","level":30,"time":"2022-05-26T17:55:27.207603652Z","hostname":"feldspar","pid":930,"instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager"}
{"msg":"GET request to http://[fd00:1122:3344:101::24]:12400/instance","v":0,"name":"SledAgent","level":30,"time":"2022-05-26T17:55:27.954720099Z","hostname":"feldspar","pid":930,"propolis_client address":"[fd00:1122:3344:101::24]:12400","component":"propolis-client","instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager"}
{"msg":"failed to await http server (Request failed: error sending request for url (http://[fd00:1122:3344:101::24]:12400/instance): error trying to connect: tcp connect error: Connection refused (os error 146)), will retry in 325.090315ms","v":0,"name":"SledAgent","level":40,"time":"2022-05-26T17:55:27.955544069Z","hostname":"feldspar","pid":930,"instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager","error":"Reqwest(reqwest::Error { kind: Request, url: Url { scheme: \"http\", cannot_be_a_base: false, username: \"\", password: None, host: Some(Ipv6(fd00:1122:3344:101::24)), port: Some(12400), path: \"/instance\", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError(\"tcp connect error\", Os { code: 146, kind: ConnectionRefused, message: \"Connection refused\" })) })"}
{"msg":"GET request to http://[fd00:1122:3344:101::24]:12400/instance","v":0,"name":"SledAgent","level":30,"time":"2022-05-26T17:55:28.281388066Z","hostname":"feldspar","pid":930,"propolis_client address":"[fd00:1122:3344:101::24]:12400","component":"propolis-client","instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager"}
{"msg":"failed to await http server (Request failed: error sending request for url (http://[fd00:1122:3344:101::24]:12400/instance): error trying to connect: tcp connect error: Connection refused (os error 146)), will retry in 358.344067ms","v":0,"name":"SledAgent","level":40,"time":"2022-05-26T17:55:28.281923763Z","hostname":"feldspar","pid":930,"instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager","error":"Reqwest(reqwest::Error { kind: Request, url: Url { scheme: \"http\", cannot_be_a_base: false, username: \"\", password: None, host: Some(Ipv6(fd00:1122:3344:101::24)), port: Some(12400), path: \"/instance\", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError(\"tcp connect error\", Os { code: 146, kind: ConnectionRefused, message: \"Connection refused\" })) })"}
{"msg":"GET request to http://[fd00:1122:3344:101::24]:12400/instance","v":0,"name":"SledAgent","level":30,"time":"2022-05-26T17:55:28.641029504Z","hostname":"feldspar","pid":930,"propolis_client address":"[fd00:1122:3344:101::24]:12400","component":"propolis-client","instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager"}
{"msg":"failed to await http server (Request failed: error sending request for url (http://[fd00:1122:3344:101::24]:12400/instance): error trying to connect: tcp connect error: Connection refused (os error 146)), will retry in 1.074760533s","v":0,"name":"SledAgent","level":40,"time":"2022-05-26T17:55:28.641599392Z","hostname":"feldspar","pid":930,"instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager","error":"Reqwest(reqwest::Error { kind: Request, url: Url { scheme: \"http\", cannot_be_a_base: false, username: \"\", password: None, host: Some(Ipv6(fd00:1122:3344:101::24)), port: Some(12400), path: \"/instance\", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError(\"tcp connect error\", Os { code: 146, kind: ConnectionRefused, message: \"Connection refused\" })) })"}
...
{"msg":"failed to await http server (Request failed: error sending request for url (http://[fd00:1122:3344:101::24]:12400/instance): error trying to connect: tcp connect error: Connection refused (os error 146)), will retry in 45.772109455s","v":0,"name":"SledAgent","level":40,"time":"2022-05-26T17:55:57.622658162Z","hostname":"feldspar","pid":930,"instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager","error":"Reqwest(reqwest::Error { kind: Request, url: Url { scheme: \"http\", cannot_be_a_base: false, username: \"\", password: None, host: Some(Ipv6(fd00:1122:3344:101::24)), port: Some(12400), path: \"/instance\", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError(\"tcp connect error\", Os { code: 146, kind: ConnectionRefused, message: \"Connection refused\" })) })"}
{"msg":"halt_and_remove_logged: Previous zone state: Running","v":0,"name":"SledAgent","level":30,"time":"2022-05-26T17:56:23.786718536Z","hostname":"feldspar","pid":930,"zone":"oxz_propolis-server_61f5d4b2-bfff-4e70-a691-8644e52abcdd","instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager"}
{"msg":"Stopped and uninstalled zone","v":0,"name":"SledAgent","level":30,"time":"2022-05-26T17:56:23.789002326Z","hostname":"feldspar","pid":930,"zone":"oxz_propolis-server_61f5d4b2-bfff-4e70-a691-8644e52abcdd","instance_id":"d3201f81-d03b-42c3-af78-9632f50d60be","component":"InstanceManager"}
That is the sled agent:
- Starting the zone
- Creating the IP address
fd00:1122:3344:101::24
for the Propolis HTTP server - Verifying that the Propolis SMF instance is now online
- Making a request to
http://[fd00:1122:3344:101::24]:12400/instance
in a retry-loop, until it fails.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment