Skip to content

Replace HashMap<SwitchLocation, Client> with on-demand lookups, and eventually DNS #5092

@internet-diglett

Description

@internet-diglett

The best thing about getting smarter everyday is that you realize how bad your decisions were yesterday. 😂

Context

With the multi-switch work, we needed to provide Nexus with a way to know which switch zone was managing which switch slot (the top switch or the bottom switch). This information was not available in DNS at the time. However, mgs is able to provide the information, and it is co-resident with the other switch zone services. The decided approach was to look up the dendrite instances via DNS, and then determine which physical switch they are managing via mgs. This seems to have worked out well.

The Problem

In my infinite wisdom, I stashed the generated clients with their location data in a HashMap that nexus holds.

/// Mapping of SwitchLocations to their respective Dendrite Clients
dpd_clients: HashMap<SwitchLocation, Arc<dpd_client::Client>>,
/// Map switch location to maghemite admin clients.
mg_clients: HashMap<SwitchLocation, Arc<mg_admin_client::Client>>,

The problem with this is if (when) a customer swaps a scrimlet, this data will not be updated. The client will still point to the address of the old scrimlet. Since sleds keep their addresses with them, this means if someone swaps a scrimlet and non-scrimlet with each other, the network configuration requests will now go to a non-scrimlet. Or even wilder, if they ever swap the two scrimlets, the configurations for each switch will go to the wrong one. We could probably update the HashMap if we jumped through enough hoops, but I think we're all wanting it to go away at this point.

Proposed Solutions

In the near term this can be mitigated in Nexus by:

  • No longer using the clients stored in HashMap<SwitchLocation, Client> and instead querying mgs each time before sending any configurations. Since a majority of switch configurations are being moved to RPWs, this shouldn't add a lot of overhead.

In the long term this might be more elegantly solved by:

  • Registering switch zone services in DNS with information that allows us to determine what rack and switch slot they are managing.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions