You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
RP currently exposes a "services" abtraction at the pilot level. This services field takes as input a "list of commands" to execute on a dedicated node from the pilot job allocation. Currently, this new feature only supports the execution of a sequence of "non-MPI" commands/programs.
Example use-case:
Consider the situation where we would like to run a performance monitoring service as a part of the pilot job. This performance monitoring service would (at the very least) need to support a distributed database to hold the collected performance data. The database needs to be distributed so as to not be the bottleneck in the overall execution of the RP pilot job. We envision several "clients" to connect to this service to store their pieces of performance data. These "clients" could be user-level RP tasks or other daemons that are spawned on the compute nodes to collect node-level performance data. The intention is for us to use the collected performance data as a means to perform dynamic, adaptive scheduling of future tasks (based on historical observations). Thus, I would like to request the exposes of "services" as a "special RP task" with the following semantics:
At its core, the distributed (monitoring) services are themselves treated as RP tasks, with the exception that these services are considered first-class citizens of the pilot.
"Service" nodes can be more than 1. It is left up to the user's discretion as to how many nodes from the pilot jobs they want to allocate to RP pilot services. These nodes need to be removed from the available set of nodes on which to run "user-level" RP tasks for the duration of the pilot job.
The user can setup custom "pre-exec", "input/output" staging, and "post-exec" commands for each of the service tasks that are spawned.
The text was updated successfully, but these errors were encountered:
Description:
RP currently exposes a "services" abtraction at the pilot level. This services field takes as input a "list of commands" to execute on a dedicated node from the pilot job allocation. Currently, this new feature only supports the execution of a sequence of "non-MPI" commands/programs.
Example use-case:
Consider the situation where we would like to run a performance monitoring service as a part of the pilot job. This performance monitoring service would (at the very least) need to support a distributed database to hold the collected performance data. The database needs to be distributed so as to not be the bottleneck in the overall execution of the RP pilot job. We envision several "clients" to connect to this service to store their pieces of performance data. These "clients" could be user-level RP tasks or other daemons that are spawned on the compute nodes to collect node-level performance data. The intention is for us to use the collected performance data as a means to perform dynamic, adaptive scheduling of future tasks (based on historical observations). Thus, I would like to request the exposes of "services" as a "special RP task" with the following semantics:
The text was updated successfully, but these errors were encountered: