Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ctest RESOURCE_GROUPS and not RESOURCE_LOCK to control GPU access #3879

Open
prckent opened this issue Feb 25, 2022 · 2 comments
Open

Use ctest RESOURCE_GROUPS and not RESOURCE_LOCK to control GPU access #3879

prckent opened this issue Feb 25, 2022 · 2 comments

Comments

@prckent
Copy link
Contributor

prckent commented Feb 25, 2022

Is your feature request related to a problem? Please describe.

To efficiently run on multi GPU nodes we need to control access at a per GPU level. RESOURCE_GROUPS would allow multiple tests using one GPU each to run simulatanously. Currently we have a single lock and only 1 GPU is used. RESOURCE_GROUPS would also accommodate future multiple GPU tests. https://cmake.org/cmake/help/latest/prop_test/RESOURCE_GROUPS.html

Scripts would have to read an environment or input variable for non-default #GPUs=1

Describe the solution you'd like
Switch from LOCKing to resource groups.

Describe alternatives you've considered
None.

Additional context

Essential to make good use of multi-GPU nodes and allow e.g. efficient running of the performance tests on them.

Threads, MPI, and cpu cores could also be handled similarly but GPUs are most constrained.

@ye-luo
Copy link
Contributor

ye-luo commented Feb 25, 2022

I added RESOURCE_GROUPS in QE
https://gitlab.com/QEF/q-e/-/blob/develop/test-suite/CMakeLists.txt#L176
and multiple GPU can be configured via
https://gitlab.com/QEF/q-e/-/blob/develop/test-suite/gpu-resource-example.json
However I had one issue unresolved, when there is no resource file provided. It just runs all the tests without any resource constraints. Instead, we prefer it decays to running one test at a time.

@prckent
Copy link
Contributor Author

prckent commented Feb 25, 2022

Hmm. If there is not a better solution, we could simply abort for GPU builds when the environment variable is not set and give the user the instructions to set it.

e.g. No --resource-spec-file given, abort but give a link to a basic one we include in our repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants