[FEATURE-REQUEST] Expose RP "services" as a special (multi-node) task #2543

srini009 · 2022-03-09T00:34:13Z

Description:
RP currently exposes a "services" abtraction at the pilot level. This services field takes as input a "list of commands" to execute on a dedicated node from the pilot job allocation. Currently, this new feature only supports the execution of a sequence of "non-MPI" commands/programs.

Example use-case:
Consider the situation where we would like to run a performance monitoring service as a part of the pilot job. This performance monitoring service would (at the very least) need to support a distributed database to hold the collected performance data. The database needs to be distributed so as to not be the bottleneck in the overall execution of the RP pilot job. We envision several "clients" to connect to this service to store their pieces of performance data. These "clients" could be user-level RP tasks or other daemons that are spawned on the compute nodes to collect node-level performance data. The intention is for us to use the collected performance data as a means to perform dynamic, adaptive scheduling of future tasks (based on historical observations). Thus, I would like to request the exposes of "services" as a "special RP task" with the following semantics:

At its core, the distributed (monitoring) services are themselves treated as RP tasks, with the exception that these services are considered first-class citizens of the pilot.
"Service" nodes can be more than 1. It is left up to the user's discretion as to how many nodes from the pilot jobs they want to allocate to RP pilot services. These nodes need to be removed from the available set of nodes on which to run "user-level" RP tasks for the duration of the pilot job.
The user can setup custom "pre-exec", "input/output" staging, and "post-exec" commands for each of the service tasks that are spawned.

andre-merzky · 2022-03-14T22:24:21Z

two types of services:
(1): 1 process per node (tau, system monitor)
(2): using a separate node (tau, redis)

Note that the first requires changes to the RP task description (see also #2293)

pilot_description.services should become a list of task descriptions

kartikmodi · 2022-12-19T22:40:31Z

Scope of 1st Phase by 26 Dec -

Creation of services from task description
Callback handling from service task when it's state changes

andre-merzky · 2023-01-13T21:44:22Z

2nd Phase:

add scheduling capabilities to allow services to run on all nodes. This implies supporting a ranks_per_node attribute for the task description.

mtitov · 2023-04-10T21:31:34Z

This ticket will be closed in favor #2899

srini009 added layer:rp type:feature labels Mar 9, 2022

andre-merzky assigned andre-merzky and mtitov Mar 14, 2022

andre-merzky assigned kartikmodi and unassigned mtitov and andre-merzky Sep 7, 2022

andre-merzky mentioned this issue Sep 20, 2022

Implement ranks_per_node #2710

Closed

andre-merzky added this to the Service Tasks milestone Sep 20, 2022

radical-cybertools deleted a comment from kartikmodi Jan 13, 2023

mturilli assigned andre-merzky and unassigned kartikmodi Mar 13, 2023

mtitov mentioned this issue Apr 10, 2023

Expand service tasks with capability to run on every node #2899

Closed

mtitov closed this as completed Apr 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE-REQUEST] Expose RP "services" as a special (multi-node) task #2543

[FEATURE-REQUEST] Expose RP "services" as a special (multi-node) task #2543

srini009 commented Mar 9, 2022

andre-merzky commented Mar 14, 2022 •

edited

Loading

kartikmodi commented Dec 19, 2022

andre-merzky commented Jan 13, 2023 •

edited

Loading

mtitov commented Apr 10, 2023

[FEATURE-REQUEST] Expose RP "services" as a special (multi-node) task #2543

[FEATURE-REQUEST] Expose RP "services" as a special (multi-node) task #2543

Comments

srini009 commented Mar 9, 2022

andre-merzky commented Mar 14, 2022 • edited Loading

kartikmodi commented Dec 19, 2022

andre-merzky commented Jan 13, 2023 • edited Loading

mtitov commented Apr 10, 2023

andre-merzky commented Mar 14, 2022 •

edited

Loading

andre-merzky commented Jan 13, 2023 •

edited

Loading