Separate installations on the same network can interfere with each other

Here is a CI failure: https://github.com/oxidecomputer/omicron/runs/7887995648

This failure occurred shortly after we added `sock` to Buildomat to work alongside `buskin` for lab jobs. This job ran on `sock`, and [ddmd picked up a prefix during startup](https://buildomat.eng.oxide.computer/wg/0/artefact/01GAPV78Y25JQ6N8KDWFD3PMC8/CgkkE4LN5tpJogucAvQFe9PW37gM1p0A72O9M7FIXWaS7j1Y/01GAPV7T0ENAHBRNHD3N0KZJ2V/01GAPX7X3NAF77W14DBY1N018Z/system-illumos-sled-agent:default.log?format=x-bunyan#L19):

```
SledAgent (RSS): Received prefixes from ddmd
    prefixes = {"fe80::8:20ff:fe9e:7b26": [Ipv6Prefix { addr: fd00:1122:3344:1::, mask: 64 }, Ipv6Prefix { addr: fdb0:18c0:4d0d:9fb2::, mask: 64 }, Ipv6Prefix { addr: fd00:1122:3344:101::, mask: 64 }]}
```

[This prefix was advertised by ddmd on buskin](https://buildomat.eng.oxide.computer/wg/0/artefact/01GAPP3PTWN7C1DZTQ3XXF6CPE/d21wpoOrGzZHHmQs6TE6G5YarmrfAVC50ZVqHiWWmN4YLAxe/01GAPP4N704H671EBZ6FW4H6Q7/01GAPSJPGN0XK2JF5B9C1TKBS0/system-illumos-sled-agent:default.log?format=x-bunyan#L16), during https://github.com/oxidecomputer/omicron/runs/7887685952:

```
SledAgent: Sending prefix to ddmd for advertisement
    prefix = Ipv6Prefix { addr: fdb0:18c0:4d0d:9fb2::, mask: 64 }
```

The result is that the job on sock [created a plan for two sled agents](https://buildomat.eng.oxide.computer/wg/0/artefact/01GAPV78Y25JQ6N8KDWFD3PMC8/CgkkE4LN5tpJogucAvQFe9PW37gM1p0A72O9M7FIXWaS7j1Y/01GAPV7T0ENAHBRNHD3N0KZJ2V/01GAPX7X3NAF77W14DBY1N018Z/system-illumos-sled-agent:default.log?format=x-bunyan#L24), I think, then [failed to start sled agent because it was already running on buskin?](https://buildomat.eng.oxide.computer/wg/0/artefact/01GAPV78Y25JQ6N8KDWFD3PMC8/CgkkE4LN5tpJogucAvQFe9PW37gM1p0A72O9M7FIXWaS7j1Y/01GAPV7T0ENAHBRNHD3N0KZJ2V/01GAPX7X3NAF77W14DBY1N018Z/system-illumos-sled-agent:default.log?format=x-bunyan#L38)

There is a timestamp discrepancy in the logs; either sock's time is one hour in the future, or... well, I'm not sure what else it'd be. Some sort of lab network state being cached somewhere?

@jclulow is changing the lab network so that sock and buskin are on two separate VLANs, which seems like the correct configuration for CI, so not labeling this as a test flake. Instead I'd like to know if we want to prevent this kind of thing from happening elsewhere, e.g. someone testing two separate control plane installations at once on their home network without intending for them to talk to each other, or if that's just a bad idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separate installations on the same network can interfere with each other #1639

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Separate installations on the same network can interfere with each other #1639

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions