Skip to content

New option for the specification of hostfile for sos run and sos execute #1279

@BoPeng

Description

@BoPeng

#1278

It appears clear that a hostfile is needed for multi-node execution. Although a host file can be automatically generated by PBS systems, and be picked up automatically by commands such as sos execute and sos run, it is necessary to allow this option so that users can specify it manually to allow multi-node execution of workflows and tasks.

This option should work like this:

  1. Without it, everything is run locally.
  2. With it, it should be a name to a host file, similar to the --hostfile option of SCOOP, with a similar or identical format. The workers will be created on these hosts.
  3. Under a cluster system with appropriate environmental variables, the hostfiles will be picked up automatically, similar to what SCOOP is doing

The problem is that sos run does not support -- options so we will have to reuse an existing option or find another option.

Once this option is specified, users can use

sos run -j hostfile

to run work flow on multiple hosts.

Use

%PBS ...
sos run workflow

to run entire workflow on a cluster system.

The same mechanism will be used for the execution of tasks, something like

%PBS
sos execute task

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions