Use ctest RESOURCE_GROUPS and not RESOURCE_LOCK to control GPU access #3879

prckent · 2022-02-25T21:13:04Z

Is your feature request related to a problem? Please describe.

To efficiently run on multi GPU nodes we need to control access at a per GPU level. RESOURCE_GROUPS would allow multiple tests using one GPU each to run simulatanously. Currently we have a single lock and only 1 GPU is used. RESOURCE_GROUPS would also accommodate future multiple GPU tests. https://cmake.org/cmake/help/latest/prop_test/RESOURCE_GROUPS.html

Scripts would have to read an environment or input variable for non-default #GPUs=1

Describe the solution you'd like
Switch from LOCKing to resource groups.

Describe alternatives you've considered
None.

Additional context

Essential to make good use of multi-GPU nodes and allow e.g. efficient running of the performance tests on them.

Threads, MPI, and cpu cores could also be handled similarly but GPUs are most constrained.

ye-luo · 2022-02-25T21:21:26Z

I added RESOURCE_GROUPS in QE
https://gitlab.com/QEF/q-e/-/blob/develop/test-suite/CMakeLists.txt#L176
and multiple GPU can be configured via
https://gitlab.com/QEF/q-e/-/blob/develop/test-suite/gpu-resource-example.json
However I had one issue unresolved, when there is no resource file provided. It just runs all the tests without any resource constraints. Instead, we prefer it decays to running one test at a time.

prckent · 2022-02-25T21:32:29Z

Hmm. If there is not a better solution, we could simply abort for GPU builds when the environment variable is not set and give the user the instructions to set it.

e.g. No --resource-spec-file given, abort but give a link to a basic one we include in our repo.

prckent added enhancement testing labels Feb 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use ctest RESOURCE_GROUPS and not RESOURCE_LOCK to control GPU access #3879

Use ctest RESOURCE_GROUPS and not RESOURCE_LOCK to control GPU access #3879

prckent commented Feb 25, 2022

ye-luo commented Feb 25, 2022

prckent commented Feb 25, 2022 •

edited

Loading

Use ctest RESOURCE_GROUPS and not RESOURCE_LOCK to control GPU access #3879

Use ctest RESOURCE_GROUPS and not RESOURCE_LOCK to control GPU access #3879

Comments

prckent commented Feb 25, 2022

ye-luo commented Feb 25, 2022

prckent commented Feb 25, 2022 • edited Loading

prckent commented Feb 25, 2022 •

edited

Loading