Skip to content

Nexus::select_runtime_change_action's final match needs tuning #6390

Open

Description

Specifically this bit:

};
// The instance has an active sled. Allow the sled agent to decide how
// to handle the request unless the instance is being recovered or the
// underlying VMM has been destroyed.
//
// TODO(#2825): Failed instances should be allowed to stop. See above.
let allowed = match requested {
InstanceStateChangeRequest::Run
| InstanceStateChangeRequest::Reboot
| InstanceStateChangeRequest::Stop => match effective_state {
InstanceState::Creating
| InstanceState::Starting
| InstanceState::Running
| InstanceState::Stopping
| InstanceState::Stopped
| InstanceState::Rebooting
| InstanceState::Migrating => true,
InstanceState::Repairing | InstanceState::Failed => false,
InstanceState::Destroyed => false,
},
InstanceStateChangeRequest::Migrate(_) => match effective_state {
InstanceState::Running
| InstanceState::Rebooting
| InstanceState::Migrating => true,
InstanceState::Creating
| InstanceState::Starting
| InstanceState::Stopping
| InstanceState::Stopped
| InstanceState::Repairing
| InstanceState::Failed
| InstanceState::Destroyed => false,
},

This match is a little hard to parse. At this point, Nexus already believes the instance of interest has a running Propolis and is just deciding whether to send a state change request there for disposition. I think the general idea should be to say something like

  • If this is a request to start, reboot, or stop, and the Propolis is in a non-terminal state (i.e. it's not Failed or Destroyed), and it doesn't look like we're migrating, forward the request to Propolis. (We don't currently transmit the state change request queue from source to target during a migration, so allowing a reboot/stop request on a migrating instance might result in a request being queued to the source that won't be picked up by the target.)
  • If this is a request to start Propolis via migration in, and the target Propolis hasn't reported that it's started migrating yet, permit the request; otherwise the VMM has already started and there's nothing left to do

For start/stop/reboot, this is pretty close to what we have today, but the handling for requests to migrate could stand to be tightened up a bit.

It's also worth noting that this match is in a path where we already know that we've got an active Propolis. We should consider whether it'd be clearer to look at the VMM's state directly instead of looking at it as interpreted through InstanceAndActiveVmm::determine_effective_state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    nexusRelated to nexusvirtualizationPropolis Integration & VM Management

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions