Job submission slows down on Hetchy #1001

jameshcorbett · 2023-02-04T00:08:46Z

[corbett8@hetchy12:Flux]$ for (( i=0; i<50; i++ )); do /usr/bin/time --format="%e" flux mini submit -N1 hostname; done
ƒMbBRgJXJ7Z
0.67
ƒMbBRrMARs5
0.39
ƒMbBS2HscVD
0.38
ƒMbBSGKnpCX
0.55
ƒMbBSfMZf7D
0.89
ƒMbBTD9rFgB
1.24
ƒMbBTusYXbV
1.59
ƒMbBUmcaRzX
1.94
ƒMbBVnFY2TZ
2.29
ƒMbBWwjTLRu
2.64
ƒMbBYGLeE1M
2.99
ƒMbBZjjprXR
3.34
... [snip]
ƒMbCkEwA7P5
11.29
ƒMbCqRWQW9u
11.75
... [snip]
ƒMbDajaVaLX
24.42

The text was updated successfully, but these errors were encountered:

jameshcorbett · 2023-02-04T00:11:26Z

By contrast, on Corona the timings are very consistent

[corbett8@corona211:~]$ for (( i=0; i<50; i++ )); do /usr/bin/time --format="%e" flux mini submit -N1 hostname; done
ƒeXEwE86VWf
0.54
ƒeXEwLcfM6F
0.25
ƒeXEwT2nEpo
0.24
ƒeXEwZPw9zf
0.25
ƒeXEwfqY31Z
0.25
ƒeXEwnFevk7
0.24
ƒeXEwtijo3M
0.25
ƒeXEx17NhVZ
0.29
ƒeXEx8ej3dH
0.29
ƒeXExGhj9Pu
0.27
ƒeXExQDbWFH
0.35

grondo · 2023-02-04T04:20:46Z

Oh, I didn't quite catch that this was job submission that was slowing down here! That is quite unexpected and should not happen. In general, responses to a job.submit RPC should be very fast -- nothing should block in the submission path. However on hetchy I just did a test and one job.submit response took 94 seconds!

We'll have to get to the bottom of this!

grondo · 2023-02-04T04:33:57Z

I did notice on rank 0 the broker was at 100% CPU. A perf report shows Fluxion using 98% of the cycles:

-   91.69%     0.00%  flux-broker-0    sched-fluxion-resource.so               ◆
   - 86.96% 00007fffa7d199f2                                                   ▒
        0x7fffa7d13ddd                                                         ▒
        Flux::resource_model::dfu_traverser_t::run                             ▒
        Flux::resource_model::dfu_traverser_t::schedule                        ▒
        Flux::resource_model::detail::dfu_impl_t::select                       ▒
        Flux::resource_model::detail::dfu_impl_t::dom_dfv                      ▒
        Flux::resource_model::detail::dfu_impl_t::dom_exp                      ▒
        Flux::resource_model::detail::dfu_impl_t::explore_statically           ▒
        Flux::resource_model::detail::dfu_impl_t::dom_dfv                      ▒
        Flux::resource_model::detail::dfu_impl_t::dom_slot                     ▒
        Flux::resource_model::detail::dfu_impl_t::explore_statically           ▒
        Flux::resource_model::detail::dfu_impl_t::dom_dfv                      ▒
        Flux::resource_model::multilevel_id_t<Flux::resource_model::fold::less>▒
        ?? (inlined)                                                           ▒
        mod_main (inlined)                                                     ▒
   + 1.33% 0x7fffa7d199f2                                                      ▒
   + 0.57% 0x7fffa7d199f2                                                      ▒
+   91.60%     0.00%  flux-broker-0    sched-fluxion-resource.so               ▒
+   91.58%     0.00%  flux-broker-0    sched-fluxion-resource.so

So my guess the slowness in job submission is due to the feasibility check in the job-validator, and the root cause is something going very wrong in Fluxion. Doesn't hetchy have some special resources added to its graph with JGF? You may be able to reproduce this issue in a test instance by loading similar fake resources.

Note that the feasibility checks are reasonably fast, until you start running jobs, then this script reproduces the slowness just with the validator:

# no running jobs
$  for i in `seq 1 10`; do /usr/bin/time --format="%e" sh -c 'flux mini run --dry-run -N1 hostname | flux job-validator --plugins=feasibility,jobspec --jobspec-only' ; done
{"errnum": 0}
0.48
{"errnum": 0}
0.48
{"errnum": 0}
0.48
{"errnum": 0}
0.48
{"errnum": 0}
0.48
{"errnum": 0}
0.48
{"errnum": 0}
0.48
{"errnum": 0}
0.48
{"errnum": 0}
0.48
{"errnum": 0}
0.48

# with running jobs
[flux@hetchy7:~]$  for i in `seq 1 10`; do /usr/bin/time --format="%e" sh -c 'flux mini run --dry-run -N1 hostname | flux job-validator --plugins=feasibility,
,jobspec --jobspec-only' ; done
{"errnum": 0}
2.04
{"errnum": 0}
2.65
{"errnum": 0}
3.19
{"errnum": 0}
3.72
{"errnum": 0}
4.24
{"errnum": 0}
9.65

vsoch · 2023-02-04T04:38:18Z

We'll have to get to the bottom of this!

To the bottom!

jameshcorbett · 2023-02-04T07:40:25Z

Doesn't hetchy have some special resources added to its graph with JGF? You may be able to reproduce this issue in a test instance by loading similar fake resources.

Rabbit systems in general will, but at the moment Hetchy doesn't. Fluxion doesn't know anything about the rabbits. So that isn't the culprit.

I'll test more when I get back from travel.

grondo · 2023-02-04T15:50:15Z

Rabbit systems in general will, but at the moment Hetchy doesn't. Fluxion doesn't know anything about the rabbits. So that isn't the culprit.

Ah, thanks for that information. If there's nothing special about hetchy resource graph at the moment, then this issue has the potential to affect any system. I'll transfer this issue to flux-sched, because I'm fairly certain the flux-coral2 bits have nothing to do with the problem (I even removed the two jobtap plugins just to test).

grondo · 2023-02-04T15:55:23Z

A good test may be to try reloading sched-fluxion-qmanager and sched-fluxion-resource to see if the problem goes away. However, we may want to collect as much information from the affected system before doing that to better understand the problem. cc: @trws and @milroy to see if they have any other ideas.

grondo · 2023-02-06T16:42:42Z

I started a test instance with the same R as configured on hetchy and could not reproduce the issue, so the cause here isn't the specific configuration of resources. Not sure how to debug the live system.

garlick · 2023-02-06T18:00:04Z

In case it proves useful in recreating this, the current resource state and properties (queue) config is

$ flux resource list
     STATE PROPERTIES NNODES   NCORES    NGPUS NODELIST
      free windom         13     1664        0 hetchy[13,16-27]
      free bardpeak        2      128       16 hetchy[28-29]
 allocated                 0        0        0 
      down windom          2      256        0 hetchy[14-15]
$ flux resource status
    STATUS NNODES NODELIST
     avail     15 hetchy[13,16-29]
   offline      2 hetchy[14-15]
   exclude      2 hetchy[7,12]
   drained      2 hetchy[14-15]

grondo · 2023-02-06T19:31:09Z

This problem is reproducible by collecting some of the config from hetchy:
Note: I've added resource.noverify = true here.

[job-manager]
plugins = [
  { load = "perilog.so" },
  { load = "/opt/lib64/flux/job-manager/plugins/cray_pals_port_distributor.so", conf = { port-min = 11000, port-max = 12000 } },
  { load = "/opt/lib64/flux/job-manager/plugins/dws-jobtap.so" }
]

[policy.limits]
duration = "24h"

[queues.windom]
requires = ["windom"]

[queues.bardpeak]
requires = ["bardpeak"]

[resource]
path = "/etc/flux/system/R"
#exclude = "hetchy[7,12]"
norestrict = true
noverify = true

[sched-fluxion-qmanager]
# easy backfill
queue-policy = "easy"

[sched-fluxion-resource]
# node exclusive starting from low node ids
match-policy = "lonodex"
match-format = "rv1_nosched"

system.R

{
  "version": 1,
  "execution": {
    "R_lite": [
      {
        "rank": "0",
        "children": {
          "core": "0-63"
        }
      },
      {
        "rank": "1",
        "children": {
          "core": "0-127"
        }
      },
      {
        "rank": "2-16",
        "children": {
          "core": "0-127"
        }
      },
      {
        "rank": "17-18",
        "children": {
          "core": "0-63",
          "gpu": "0-7"
        }
      }
    ],
    "starttime": 0.0,
    "expiration": 0.0,
    "nodelist": [
      "hetchy[7,12-29]"
    ],
    "properties": {
      "windom": "2-16",
      "bardpeak": "17-18"
    }
  }
}

Instructions:

$  FLUX_MODULE_PATH_PREPEND=$(pwd)/resource/modules/.libs flux start -s 1
 grondo@hetchy12:~/git/flux-sched$ FLUX_MODULE_PATH_PREPEND=$(pwd)/resource/modules/.libs flux start -s 1
Failed to open drm root directory /sys/class/drm.: No such file or directory
f(s=1,d=0) grondo@hetchy12:~/git/flux-sched$ flux module remove sched-fluxion-qmanager
f(s=1,d=0) grondo@hetchy12:~/git/flux-sched$ flux module remove sched-fluxion-resource
f(s=1,d=0) grondo@hetchy12:~/git/flux-sched$ flux module remove resource
f(s=1,d=0) grondo@hetchy12:~/git/flux-sched$ flux config load < conf.toml 
f(s=1,d=0) grondo@hetchy12:~/git/flux-sched$ flux kvs put -r resource.R=- </etc/flux/system/R
f(s=1,d=0) grondo@hetchy12:~/git/flux-sched$ flux module load resource
Failed to open drm root directory /sys/class/drm.: No such file or directory
f(s=1,d=0) grondo@hetchy12:~/git/flux-sched$ flux module load sched-fluxion-resource
f(s=1,d=0) grondo@hetchy12:~/git/flux-sched$ flux module load sched-fluxion-qmanager
f(s=1,d=0) grondo@hetchy12:~/git/flux-sched$ flux resource list
     STATE PROPERTIES NNODES   NCORES    NGPUS NODELIST
      free                 2      192        0 hetchy[7,12]
      free windom         15     1920        0 hetchy[13-27]
      free bardpeak        2      128       16 hetchy[28-29]
 allocated                 0        0        0 
      down                 0        0        0 
f(s=1,d=0) grondo@hetchy12:~/git/flux-sched$ flux mini submit --queue=windom --cc=1-50 --setattr=exec.test.run_duration=1ms --quiet --watch --progress --jps hostname
PD:0  R:0  CD:50 F:0  │███████████████████████████████████████│100.0%  1.6 job/s

grondo · 2023-02-06T19:42:39Z

Getting a similar result from perf:

-   99.93%     0.00%  flux-broker-0  [.] ev_run                                                                                                         ▒
     ev_run                                                                                                                                             ▒
   - ev_run                                                                                                                                             ▒
      - 99.93% ev_invoke_pending                                                                                                                        ▒
         - 99.68% handle_cb                                                                                                                             ▒
            - 99.60% dispatch_message (inlined)                                                                                                         ▒
                 call_handler                                                                                                                           ▒
               - 0x15555542c14d                                                                                                                         ▒
                  - 97.16% 0x155555426bd0                                                                                                               ▒
                       Flux::resource_model::dfu_traverser_t::run                                                                                       ▒
                       Flux::resource_model::dfu_traverser_t::schedule                                                                                  ▒
                     - Flux::resource_model::detail::dfu_impl_t::select                                                                                 ▒
                        - 96.61% Flux::resource_model::detail::dfu_impl_t::dom_dfv                                                                      ▒
                           - 96.50% Flux::resource_model::detail::dfu_impl_t::dom_exp                                                                   ▒
                                Flux::resource_model::detail::dfu_impl_t::explore_statically                                                            ▒
                              - Flux::resource_model::detail::dfu_impl_t::dom_dfv                                                                       ▒
                                 - 95.77% Flux::resource_model::detail::dfu_impl_t::dom_slot                                                            ▒
                                    - 94.17% Flux::resource_model::detail::dfu_impl_t::explore_statically                                               ▒
                                       - 92.22% Flux::resource_model::detail::dfu_impl_t::dom_dfv                                                       ▒
                                          - 86.23% Flux::resource_model::multilevel_id_t<Flux::resource_model::fold::less>::dom_finish_vtx              ▒
                                             - 86.12% ?? (inlined)                                                                                      ▒
                                                  mod_main (inlined)                                                                                    ▒
                                          + 5.04% Flux::resource_model::detail::dfu_impl_t::dom_exp                                                     ▒
                                       + 0.63% Flux::resource_model::detail::evals_t::add                                                               ◆
                                       + 0.51% ?? (inlined)                                                                                             ▒

ryanday36 · 2023-02-07T00:14:02Z

When I was looking at flux dmesg hetchy, I noticed that there were errors from flux cron. When we first brought hetchy up, we mistakenly configured it to run flux accounting in ansible. I never set up the accounting db, and I cleaned up the configs in ansible, but I apparently never cleaned up the accounting cron job on hetchyi. That runs:

/bin/bash -c "/bin/flux account update-usage --priority-decay-half-life 1 /var/lib/flux/job-archive.sqlite; /bin/flux account-update-fshare; /bin/flux account-priority-update"

I'm not sure it's relevant to this, but I can imagine the account-priority-update giving fluxion some garbage data when there's no accounting db, so I thought I'd mention it.

I deleted the cron.d/accounting file, but I haven't removed the job using flux cron yet.

garlick · 2023-02-07T00:34:14Z

I don't think that could be it. The accounting scripts communicate with the job-archive database and the mf_priority.so plugin in the job manager. But we're seeing fluxion spending lots of time traversing resource graphs while answering feasibility queries at job submission time. It seems like that wouldn't be affected by say crazy job priorities or even the job manager dealing with an onslaught of messages.

grondo · 2023-02-07T18:57:32Z

FYI - I didn't get different results running perf with perf record -g --call-graph=dwarf as noted here to make sure we're getting valid backtraces. (I'm pretty sure I did that in the first place, but wanted to double check)

grondo · 2023-02-07T22:20:42Z

Here's a script that acts as a reproducer run out of a top-level flux-sched builddir:

#!/bin/sh
flux module remove sched-fluxion-qmanager
flux module remove sched-fluxion-resource
flux module remove resource
flux kvs put -r resource.R=- </etc/flux/system/R
flux config load < ./conf.toml
flux module load resource noverify
flux module load sched-fluxion-resource
flux module load sched-fluxion-qmanager
flux queue status
flux resource list
flux module list | grep sched
flux mini submit --queue=windom --cc=1-100 --setattr=exec.test.run_duration=1ms --quiet --watch --progress --jps hostname

conf.toml has selected config entries from the hetchy system config:

[job-manager]
plugins = [
  { load = "perilog.so" },
  { load = "/opt/lib64/flux/job-manager/plugins/cray_pals_port_distributor.so", conf = { port-min = 11000, port-max = 12000 } },
  { load = "/opt/lib64/flux/job-manager/plugins/dws-jobtap.so" }
]

[policy.limits]
duration = "24h"

[queues.windom]
requires = ["windom"]

[queues.bardpeak]
requires = ["bardpeak"]

[resource]
path = "/etc/flux/system/R"
#exclude = "hetchy[7,12]"
norestrict = true
noverify = true

[sched-fluxion-qmanager]
# easy backfill
queue-policy = "easy"

[sched-fluxion-resource]
# node exclusive starting from low node ids
match-policy = "lonodex"
match-format = "rv1_nosched"

[tbon]
tcp_user_timeout = "2m"

The reproducer can now be run like:

$ FLUX_MODULE_PATH_PREPEND=$(pwd)/resource/modules/.libs flux start -s 1 ./issue#1001.sh 
 grondo@hetchy12:~/git/flux-sched$ FLUX_MODULE_PATH_PREPEND=$(pwd)/resource/modules/.libs flux start -s 1 ./issue#1001.sh 
Failed to open drm root directory /sys/class/drm.: No such file or directory
Failed to open drm root directory /sys/class/drm.: No such file or directory
bardpeak: Job submission is enabled
bardpeak: Scheduling is started
windom: Job submission is enabled
windom: Scheduling is started
     STATE PROPERTIES NNODES   NCORES    NGPUS NODELIST
      free                 2      192        0 hetchy[7,12]
      free windom         15     1920        0 hetchy[13-27]
      free bardpeak        2      128       16 hetchy[28-29]
 allocated                 0        0        0 
      down                 0        0        0 
sched-fluxion-qmanager    6225784 c4a4497    0  R sched
sched-fluxion-resource   36422384 e68a19e    0  R 
PD:0   R:0   CD:100 F:0   │███████████████████████████████████│100.0%  0.8 job/s

As suggested by @trws, I set the queue-depth to 2 and re-ran the test, which has marked improvement

$ grep queue-depth conf.toml 
queue-params.queue-depth = 2
$ FLUX_MODULE_PATH_PREPEND=$(pwd)/resource/modules/.libs flux start -s 1 ./issue#1001.sh 
Failed to open drm root directory /sys/class/drm.: No such file or directory
Failed to open drm root directory /sys/class/drm.: No such file or directory
bardpeak: Job submission is enabled
bardpeak: Scheduling is started
windom: Job submission is enabled
windom: Scheduling is started
     STATE PROPERTIES NNODES   NCORES    NGPUS NODELIST
      free                 2      192        0 hetchy[7,12]
      free windom         15     1920        0 hetchy[13-27]
      free bardpeak        2      128       16 hetchy[28-29]
 allocated                 0        0        0 
      down                 0        0        0 
sched-fluxion-qmanager    6225784 c4a4497    0  R sched
sched-fluxion-resource   36422384 e68a19e    0  R 
PD:0   R:0   CD:100 F:0   │███████████████████████████████████│100.0% 28.5 job/s

Note in the conf.toml above, the ingest feasibility validator is not enabled, so this reproducer is not using the scheudler's satisfiability RPC. (contrary to what I thought at first)

milroy · 2023-02-23T00:14:14Z

@trws made some good suggestions about further performance improvements for graph traversal during the last Fluxion hackathon. I tested them with the focal flux-core docker image on my laptop.

Baseline (lonodex policy, code base from PR Improve Fluxion match performance #1007):

milroy1@docker-desktop:/usr/src$ ./test.sh 
flux-module: broker.rmmod sched-fluxion-qmanager: No such file or directory
flux-module: broker.rmmod sched-fluxion-resource: No such file or directory
2023-02-22T22:38:11.782934Z sched-simple.err[0]: exiting due to resource update failure: the resource module was unloaded
bardpeak: Scheduling is started
windom: Scheduling is started
bardpeak: Job submission is enabled
bardpeak: Scheduling is started
windom: Job submission is enabled
windom: Scheduling is started
     STATE QUEUE      NNODES   NCORES    NGPUS NODELIST
      free                 2      192        0 hetchy[7,12]
      free windom         15     1920        0 hetchy[13-27]
      free bardpeak        2      128       16 hetchy[28-29]
 allocated                 0        0        0 
      down                 0        0        0 
sched-fluxion-resource   37767648 16eaad7    0  R 
sched-fluxion-qmanager    8646664 6d1d3e2    0  R sched
PD:0   R:0   CD:100 F:0   │████████████████████████████████████████████████████████|100.0%  2.5 job/s

Use boost::vecS container instead of boost::listS (used implicitly) for the graph EdgeList (

flux-sched/resource/schema/resource_graph.hpp

Line 30 in 35d3c96

using resource_graph_t = boost::adjacency_list<boost::vecS,

):

milroy1@docker-desktop:/usr/src$ ./test.sh
flux-module: broker.rmmod sched-fluxion-qmanager: No such file or directory
flux-module: broker.rmmod sched-fluxion-resource: No such file or directory
2023-02-22T22:34:21.454179Z sched-simple.err[0]: exiting due to resource update failure: the resource module was unloaded
bardpeak: Scheduling is started
windom: Scheduling is started
bardpeak: Job submission is enabled
bardpeak: Scheduling is started
windom: Job submission is enabled
windom: Scheduling is started
     STATE QUEUE      NNODES   NCORES    NGPUS NODELIST
      free                 2      192        0 hetchy[7,12]
      free windom         15     1920        0 hetchy[13-27]
      free bardpeak        2      128       16 hetchy[28-29]
 allocated                 0        0        0 
      down                 0        0        0 
sched-fluxion-qmanager    8646664 6d1d3e2    0  R sched
sched-fluxion-resource   37739848 88cc2bf    0  R 
PD:0   R:0   CD:100 F:0   │████████████████████████████████████████████████████████│100.0%  2.4 job/s

Use unordered_set and unordered_map instead of the RBtree-based set and map (

flux-sched/resource/schema/data_std.hpp

Line 30 in 35d3c96

using multi_subsystemsS = std::map<subsystem_t, std::set<std::string>>;

):

milroy1@docker-desktop:/usr/src$ ./test.sh 
flux-module: broker.rmmod sched-fluxion-qmanager: No such file or directory
flux-module: broker.rmmod sched-fluxion-resource: No such file or directory
2023-02-22T23:47:22.846822Z sched-simple.err[0]: exiting due to resource update failure: the resource module was unloaded
bardpeak: Scheduling is started
windom: Scheduling is started
bardpeak: Job submission is enabled
bardpeak: Scheduling is started
windom: Job submission is enabled
windom: Scheduling is started
     STATE QUEUE      NNODES   NCORES    NGPUS NODELIST
      free                 2      192        0 hetchy[7,12]
      free windom         15     1920        0 hetchy[13-27]
      free bardpeak        2      128       16 hetchy[28-29]
 allocated                 0        0        0 
      down                 0        0        0 
sched-fluxion-qmanager    8646664 6d1d3e2    0  R sched
sched-fluxion-resource   38006288 a02a228    0  R 
PD:0   R:0   CD:100 F:0   │████████████████████████████████████████████████████████│100.0%  1.6 job/s

Use unordered_map instead of map (

flux-sched/resource/schema/data_std.hpp

Line 29 in 35d3c96

using multi_subsystems_t = std::map<subsystem_t, std::string>;

):

milroy1@docker-desktop:/usr/src$ ./test.sh 
flux-module: broker.rmmod sched-fluxion-qmanager: No such file or directory
flux-module: broker.rmmod sched-fluxion-resource: No such file or directory
2023-02-22T23:55:15.292809Z sched-simple.err[0]: exiting due to resource update failure: the resource module was unloaded
bardpeak: Scheduling is started
windom: Scheduling is started
bardpeak: Job submission is enabled
bardpeak: Scheduling is started
windom: Job submission is enabled
windom: Scheduling is started
     STATE QUEUE      NNODES   NCORES    NGPUS NODELIST
      free                 2      192        0 hetchy[7,12]
      free windom         15     1920        0 hetchy[13-27]
      free bardpeak        2      128       16 hetchy[28-29]
 allocated                 0        0        0 
      down                 0        0        0 
sched-fluxion-resource   37887904 e1cc9e0    0  R 
sched-fluxion-qmanager    8646664 6d1d3e2    0  R sched
PD:0   R:0   CD:100 F:0   │████████████████████████████████████████████████████████│100.0%  2.0 job/s

Use unordered_map instead of the RBtree-based map (

flux-sched/resource/schema/data_std.hpp

Line 30 in 35d3c96

using multi_subsystemsS = std::map<subsystem_t, std::set<std::string>>;

):

milroy1@docker-desktop:/usr/src$ ./test.sh 
flux-module: broker.rmmod sched-fluxion-qmanager: No such file or directory
flux-module: broker.rmmod sched-fluxion-resource: No such file or directory
2023-02-23T00:05:26.933375Z sched-simple.err[0]: exiting due to resource update failure: the resource module was unloaded
bardpeak: Scheduling is started
windom: Scheduling is started
bardpeak: Job submission is enabled
bardpeak: Scheduling is started
windom: Job submission is enabled
windom: Scheduling is started
     STATE QUEUE      NNODES   NCORES    NGPUS NODELIST
      free                 2      192        0 hetchy[7,12]
      free windom         15     1920        0 hetchy[13-27]
      free bardpeak        2      128       16 hetchy[28-29]
 allocated                 0        0        0 
      down                 0        0        0 
sched-fluxion-qmanager    8646664 6d1d3e2    0  R sched
sched-fluxion-resource   37832352 9ac6420    0  R 
PD:0   R:0   CD:100 F:0   │████████████████████████████████████████████████████████│100.0%  1.9 job/s

trws · 2023-02-23T17:53:10Z

Wow, almost no impact to negative impact. That's unfortunate, but really good to know. I found some things, thanks to help from @grondo, that might be impacting us from the Constraints. Once that bug is squashed we can take another pass on the performance. I'm guessing we're not getting a win from the unordered maps because of the hashing cost, which we could fix with pre-hashed strings or interned strings, might be worth doing the string rework and then circling back to the data types, even if that's more painful. =/

tpatki · 2023-02-23T18:35:32Z

@milroy Are you running this in docker container?
Have you tried just building on bare metal on hetchy to see if your test has the same jobs/s to eliminate any container-related overheads? May not have any impact but may be worth testing

milroy · 2023-02-23T22:32:34Z

Yes, I've tested the reproducer directly on hetchy. In my tests the reproducer runs faster in the container running on my laptop. Hetchy and the container exhibit similar performance characteristics.

milroy · 2023-02-23T22:48:31Z

Ok, so not faster in the container than hetchy for current Fluxion master (in fact, faster on hetchy), but really close:
Case 1. above on hetchy:

[milroy1@hetchy12:flux-sched]$ ./tests.sh 
Failed to open drm root directory /sys/class/drm.: No such file or directory
bardpeak: Scheduling is started
windom: Scheduling is started
bardpeak: Job submission is enabled
bardpeak: Scheduling is started
windom: Job submission is enabled
windom: Scheduling is started
     STATE PROPERTIES NNODES   NCORES    NGPUS NODELIST
      free                 2      192        0 hetchy[7,12]
      free windom         15     1920        0 hetchy[13-27]
      free bardpeak        2      128       16 hetchy[28-29]
 allocated                 0        0        0 
      down                 0        0        0 
sched-fluxion-resource   36405416 db97e26    0  R 
sched-fluxion-qmanager    8496408 3022bef    0  R sched
PD:0   R:0   CD:100 F:0   │██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████│100.0%  2.6 job/s

Case 3 above on hetchy:

[milroy1@hetchy12:flux-sched]$ ./tests.sh 
Failed to open drm root directory /sys/class/drm.: No such file or directory
bardpeak: Scheduling is started
windom: Scheduling is started
bardpeak: Job submission is enabled
bardpeak: Scheduling is started
windom: Job submission is enabled
windom: Scheduling is started
     STATE PROPERTIES NNODES   NCORES    NGPUS NODELIST
      free                 2      192        0 hetchy[7,12]
      free windom         15     1920        0 hetchy[13-27]
      free bardpeak        2      128       16 hetchy[28-29]
 allocated                 0        0        0 
      down                 0        0        0 
sched-fluxion-qmanager    8496408 3022bef    0  R sched
sched-fluxion-resource   37031616 df9e2f5    0  R 
PD:0   R:0   CD:100 F:0   │██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████│100.0%  1.8 job/s

milroy · 2023-02-27T18:17:16Z

@jameshcorbett can we close this issue? PR #1007 fixes the job slowdown, but it doesn't comprehensively solve Fluxion performance problems.

I can create a new issue to continue the investigation into general performance problems with Fluxion.

grondo · 2023-02-27T20:02:22Z

I think this issue still applies since Hetchy (and all our systems) are using node exclusive policy? We're still at 10-20x slowdown. It might be nice to keep all the history in one issue.

However, if you'd really like to create a new issue that's fine with me.

trws · 2023-02-27T21:10:42Z

Out of curiosity, has anyone run a perf test on the versions before we noticed this to see if we had a regression on lonodex or if it's something that just showed up? I know the hetchy config and queues are part of it, but if we had low but predictable performance with lonodex before we might not have seen it.

milroy · 2023-03-06T21:37:18Z

We're still at 10-20x slowdown.

Is the submission of sequential jobs still slowing down on Hetchy? So the example @jameshcorbett gave in the first comment where submission times increase from <1s to 24s is still occurring with *nodex policies?

grondo · 2023-06-26T21:05:54Z

Note as shown by results in #1009, this performance issue also occurs with or without node exclusive scheduling when moderate amounts of resources are involved in scheduling (in the examples, 2000 nodes).

trws · 2024-07-31T00:51:39Z

This should be addressed at this point, @jameshcorbett does this still repro for you in any cases?

jameshcorbett · 2024-07-31T00:54:49Z

Nope, closing.

jameshcorbett changed the title ~~Job _submission_ slows down on Hetchy~~ Job submission slows down on Hetchy Feb 4, 2023

grondo transferred this issue from flux-framework/flux-coral2 Feb 4, 2023

milroy mentioned this issue Feb 8, 2023

Improve Fluxion match performance #1007

Merged

milroy added the performance Fluxion performance and scalability label Mar 20, 2023

This was referenced Mar 22, 2023

Investigate Fluxion match performance with different policies, graphs, subsystems #1016

Open

Need firstnode policy #1017

Closed

grondo mentioned this issue Mar 29, 2024

job submissions are serialized and not interactively performant #1159

Closed

jameshcorbett closed this as completed Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job submission slows down on Hetchy #1001

Job submission slows down on Hetchy #1001

jameshcorbett commented Feb 4, 2023 •

edited

Loading

jameshcorbett commented Feb 4, 2023

grondo commented Feb 4, 2023

grondo commented Feb 4, 2023

vsoch commented Feb 4, 2023

jameshcorbett commented Feb 4, 2023

grondo commented Feb 4, 2023

grondo commented Feb 4, 2023

grondo commented Feb 6, 2023

garlick commented Feb 6, 2023

grondo commented Feb 6, 2023

grondo commented Feb 6, 2023

ryanday36 commented Feb 7, 2023

garlick commented Feb 7, 2023

grondo commented Feb 7, 2023

grondo commented Feb 7, 2023

milroy commented Feb 23, 2023

trws commented Feb 23, 2023

tpatki commented Feb 23, 2023

milroy commented Feb 23, 2023 •

edited

Loading

milroy commented Feb 23, 2023

milroy commented Feb 27, 2023

grondo commented Feb 27, 2023

trws commented Feb 27, 2023

milroy commented Mar 6, 2023 •

edited

Loading

grondo commented Jun 26, 2023

trws commented Jul 31, 2024

jameshcorbett commented Jul 31, 2024

Job submission slows down on Hetchy #1001

Job submission slows down on Hetchy #1001

Comments

jameshcorbett commented Feb 4, 2023 • edited Loading

jameshcorbett commented Feb 4, 2023

grondo commented Feb 4, 2023

grondo commented Feb 4, 2023

vsoch commented Feb 4, 2023

jameshcorbett commented Feb 4, 2023

grondo commented Feb 4, 2023

grondo commented Feb 4, 2023

grondo commented Feb 6, 2023

garlick commented Feb 6, 2023

grondo commented Feb 6, 2023

grondo commented Feb 6, 2023

ryanday36 commented Feb 7, 2023

garlick commented Feb 7, 2023

grondo commented Feb 7, 2023

grondo commented Feb 7, 2023

milroy commented Feb 23, 2023

trws commented Feb 23, 2023

tpatki commented Feb 23, 2023

milroy commented Feb 23, 2023 • edited Loading

milroy commented Feb 23, 2023

milroy commented Feb 27, 2023

grondo commented Feb 27, 2023

trws commented Feb 27, 2023

milroy commented Mar 6, 2023 • edited Loading

grondo commented Jun 26, 2023

trws commented Jul 31, 2024

jameshcorbett commented Jul 31, 2024

jameshcorbett commented Feb 4, 2023 •

edited

Loading

milroy commented Feb 23, 2023 •

edited

Loading

milroy commented Mar 6, 2023 •

edited

Loading