Skip to content

sled-agent not serving vmm requests #6911

@askfongjojo

Description

@askfongjojo

The issue was seen on sled 13 of rack3. There was a problem with mupdate during that time and we're not sure if it's related at all. Other sleds do not have this particular issue AFAICT. https://github.com/oxidecomputer/colo/issues/88#issuecomment-2423940805 is where we noted the problem, also cloned below:

Looking at sled agent logs it looks like calls to the /vmms/{id}/state endpoint are timing out on sled 13:

15:00:47.203Z WARN SledAgent (dropshot (SledAgent)): request handling cancelled (client disconnected)
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:887
    latency_us = 60005110
    local_addr = [fd00:1122:3344:104::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:10e::3]:36498
    req_id = ccf79204-80fb-4814-a778-59a48ab273a4
    uri = /vmms/90676c5f-a315-4930-a90f-43790839069b/state

I don't know for certain if this is related but it seems suspicious.

The recent sled-agent logs and core files have been uploaded to /staff/rack3/colo-88.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions