Deferred allocation #987

JaeseungYeom · 2022-11-04T01:59:11Z

Problem: support a scheduling request for an allocation to occur at a specific time in the future.

Currently, a reservation of resources occurs as early as possible. However, for supporting workflows that benefit from running tasks across heterogeneous platforms, it is desired to synchronize multiple allocations across different child instances. Such that task 1-10 run on corona while task 11-20 "simultaneously" run on another cluster managed by Flux.
To support such use cases, two things are needed.
One is the deferred allocation capability, and the other is a means to query the allocation delay.
A parent instance can query its remote child instances to find out when is the earliest by which all the children can allocate requested resources. Then, it should be possible to allocate synchronously across instances.

Pushing the reservation time back should also consider back-filing.
To be clear, this is not the same as to try allocating at the earliest after a specific point in time.
I am not entirely sure if the existing issue #963 is the latter case or the same as this.

milroy · 2023-02-23T02:42:18Z

A parent instance can query its remote child instances to find out when is the earliest by which all the children can allocate requested resources. Then, it should be possible to allocate synchronously across instances.

To make sure I understand the basics (without getting into too much complexity yet) of the deferred allocation capability, this is a three-part process:

submit the jobspecs to all child instances with a new match_reserve (jobspec) request that reserves the requested resources on each child instance at the earliest time possible and return those times.
Find the latest time returned (T), and for all the child instances that returned earlier times, issue a new match_reserve_at (jobspec, T) which moves the reservation back to time T.
Handle the case where one or more children can't satisfy match_reserve_at (jobspec, T).

Is that basically correct?

milroy · 2023-02-23T02:53:42Z

I've confirmed that by manipulating the at time in dfu_traverser_t::run:

flux-sched/resource/traversers/dfu.cpp

Line 277 in 35d3c96

int dfu_traverser_t::run (Jobspec::Jobspec &jobspec,

we can achieve the desired behavior. Here I've simulated this by hardcoding at = 3600 in dfu_traverser_t::run and performing a match allocate:

resource-query> match allocate t/data/resource/jobspecs/basics/test001.yaml
      ---------------core35[1:x]
      ------------socket1[1:x]
      ---------node1[1:s]
      ------rack0[1:s]
      ---tiny0[1:s]
INFO: =============================
INFO: JOBID=1
INFO: RESOURCES=RESERVED
INFO: SCHEDULED AT=3600
INFO: =============================

Of course, there will be a decent amount of development required to add new match_op_t cases and determine the best way to include the desired time in the jobspec.

milroy · 2023-03-12T23:25:32Z

Of course, there will be a decent amount of development required to add new match_op_t cases

Actually that is not complicated.

and determine the best way to include the desired time in the jobspec.

As discussed with @grondo during last week's team meeting, we still need to decide how to proceed with this part. The current state of PR #1013 uses the optional system key space to let users set the deferred time. To ensure the allocation doesn't get moved up (which is undesired) or moved back for each match allocate_orelse_reserve, I added code to use a base time (deferred_from in epoch seconds) which makes deferred_start a relative time:

flux-sched/resource/traversers/dfu_impl.hpp

Line 104 in 90f8229

jobspec.attributes.system.optional.find ("deferred_from");

An example test jobspec looks like this:

version: 9999
resources:
    - type: cluster
      count: 1
      with:
        - type: rack
          count: 1
          with:
            - type: node
              count: 1
              with:
                  - type: slot
                    count: 1
                    label: default
                    with:
                      - type: socket
                        count: 1
                        with:
                          - type: core
                            count: 1
# a comment
attributes:
  system:
    duration: 3600
    # optional deferred keys
    deferred_start: 1800
    deferred_from: 0
tasks:
  - command: [ "app" ]
    slot: default
    count:
      per_slot: 1

My sense is that while this may work well for automated submission it will be hard for manual submission. @jameshcorbett and @ryanday36 might have good input here.

vsoch · 2023-03-23T19:51:04Z

The problem is that you need to be able to define those attributes without writing a yaml file every time?

We are working on a shape spec for resources - flux-framework/rfc#371 maybe we need the same for system attributes? Ping @trws

garlick · 2023-03-23T20:01:44Z

Would the submit time (called t_submit in qmanager) work as the deferred_from value?

grondo · 2023-03-23T20:03:22Z

The problem is that you need to be able to define those attributes without writing a yaml file every time?

There is already a facility for specifying system attributes on the command line of the submission commands (See documentation of --setattr in e.g. flux-run(1))

grondo · 2023-03-23T20:05:26Z

Would the submit time (called t_submit in qmanager) work as the deferred_from value?

That is a great idea. I was going to suggest something similar in that t_submit could be the default if deferred_from is not set (in case allowing a different deferred_from is useful in testing?)

ryanday36 · 2023-03-23T20:39:33Z

I think that t_submit probably makes sense for a default deferred_from value. I'm not quite clear, does the current implementation allow the user to set an absolute time, or just a relative time? It seems like the best interface for users would allow them to say something like --setattr=deferred_start=3pm or --setattr=deferred_start=+2.1h (i.e. take the same datetime formats as the current --begin-time flag.

I was also thinking more about what keyword would make sense for this. I'm leaning toward something more like 'reserve_time' or 'reserve_start', or maybe 'require_start' since it will raise an exception on the job if it can't start at that time.

grondo · 2023-03-23T20:53:59Z

The --begin-time option uses a timestamp (absolute time) which is obtained by parsing the user's argument with our Python parse_datetime() function:

       --begin-time=DATETIME
              Convenience  option  for  setting  a begin-time dependency for a
              job.  The job is guaranteed to start after  the  specified  date
              and  time.   If  DATETIME  begins  with  a + character, then the
              remainder is considered to be an offset in Flux  standard  dura‐
              tion  (RFC  23),  otherwise, any datetime expression accepted by
              the Python parsedatetime module  is  accepted,  e.g.  2021-06-21
              8am, in an hour, tomorrow morning, etc.

It would be nice to support something similar here.

If we can add whatever option we call this to the jobspec RFC, then perhaps it would make sense to expose this as a similar option in the submission commands?

Or, would it be too kludgy to add some kind of sentinel to --begin-time to make it set this option in jobspec instead of a dependency? (e.g. --begin-time=force:3pm) Meh, just throwing that out there. Simple enough and probably clearer to add a --require-start=3pm option. Still, if we are exposing an option in the core submission commands, we should have the resulting jobspec properties documented in the RFC.

vsoch · 2023-03-23T20:56:56Z

@grondo why should we require users to figure out timestamps / timezones? Isn't it easier (or minimally should be an option) to provide relative times? E.g., what if you are doing some kind of flux proxy to an instance in a different timezone and then you get it wrong (or minimally have to convert which is a hairball I don't think we want to dive into).

A suggestion - if begin time is already a thing (and indeed it's actually a time to begin) why not have a --start that provides the same but is relative? E.g., --start=60 (start in an hour) and then I don't have to think about actual times (thank goodness!)

Reference for time pain: https://gist.github.com/timvisee/fcda9bbdff88d45cc9061606b4b923ca ⏲️ 😱

grondo · 2023-03-23T21:00:28Z

I'm confused. As shown above, the interface does not require users to actually specify the timestamp. The begin time can be specified as an offsite or absolute time or any other format supported by parsedatetime.

vsoch · 2023-03-23T21:02:31Z

Oh I see, if you add + it is an offset? Sorry I'm just really stupid.

vsoch · 2023-03-23T21:03:03Z

I'll just see myself out, I'm not really helping anyone.

garlick · 2023-03-23T21:12:47Z

I think I'm having one of those days myself FWIW.

milroy · 2023-03-24T07:56:39Z

I think that t_submit probably makes sense for a default deferred_from value. I'm not quite clear, does the current implementation allow the user to set an absolute time, or just a relative time?

I didn't know about t_submit and that does sound like the right default choice.

I just realized I obfuscated a crucial detail with deferred_from: 0 in my example jobspec above. That value is the epoch time in seconds. Here's how it's used in the PR currently: 90f8229.

I could certainly implement what @grondo suggested from the --begin-time option.

JaeseungYeom added the enhancement label Nov 4, 2022

JaeseungYeom assigned milroy, JaeseungYeom and jameshcorbett Nov 4, 2022

JaeseungYeom mentioned this issue Nov 7, 2022

Creating a parent Fluxion from multiple existing fluxion instances #988

Open

milroy mentioned this issue Mar 6, 2023

Support deferred job start time #1013

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deferred allocation #987

Deferred allocation #987

JaeseungYeom commented Nov 4, 2022 •

edited

Loading

milroy commented Feb 23, 2023

milroy commented Feb 23, 2023

milroy commented Mar 12, 2023

vsoch commented Mar 23, 2023

garlick commented Mar 23, 2023

grondo commented Mar 23, 2023

grondo commented Mar 23, 2023

ryanday36 commented Mar 23, 2023

grondo commented Mar 23, 2023

vsoch commented Mar 23, 2023 •

edited

Loading

grondo commented Mar 23, 2023

vsoch commented Mar 23, 2023

vsoch commented Mar 23, 2023

garlick commented Mar 23, 2023

milroy commented Mar 24, 2023

Deferred allocation #987

Deferred allocation #987

Comments

JaeseungYeom commented Nov 4, 2022 • edited Loading

milroy commented Feb 23, 2023

milroy commented Feb 23, 2023

milroy commented Mar 12, 2023

vsoch commented Mar 23, 2023

garlick commented Mar 23, 2023

grondo commented Mar 23, 2023

grondo commented Mar 23, 2023

ryanday36 commented Mar 23, 2023

grondo commented Mar 23, 2023

vsoch commented Mar 23, 2023 • edited Loading

grondo commented Mar 23, 2023

vsoch commented Mar 23, 2023

vsoch commented Mar 23, 2023

garlick commented Mar 23, 2023

milroy commented Mar 24, 2023

JaeseungYeom commented Nov 4, 2022 •

edited

Loading

vsoch commented Mar 23, 2023 •

edited

Loading