Skip to content

Conversation

@casparvl
Copy link
Collaborator

@casparvl casparvl commented Dec 1, 2025

Adds a test for LPC3D, one of the MultiXscale-developed codes. Note that this is not (yet) in EESSI, but there is a PR for EasyBuild easybuilders/easybuild-easyconfigs#24703 that can create an installation that can be tested with this test.

@casparvl
Copy link
Collaborator Author

Note: I've decided to set NUMBA_NUM_THREADS explicitly, since on @laraPPr 's system, the numba.config.DEFAULT_NUMBA_NUM_THREADS was incorrectly detected. Likely because mpirun -np 1 was binding this process to a single core, which then caused numba to set numba.config.DEFAULT_NUMBA_NUM_THREADS to 1.

By explicitly setting NUMBA_NUM_THREADS, we no longer rely on the default. @laraPPr note that the incorrect binding behavior could still cause sub-optimal performance, because it could e.g. case 4 threads to be bound to the same physical core. Thinking about your issues some more, I'd suggest to print all OMPI environment variables (env | grep OMPI) and see if one of them could have caused the binding behavior we were seeing when run with ReFrame. You could compare that to the OMPI env vars that are set when you submit manually from the staging dir: since you did not see the issue in that case, I'd assume one of the OMPI env vars is set differently.

So, yeah, while setting the NUMBA_NUM_THREADS will make it run, it might actually run slower on 4 threads than on 1 if they are bound to the same core...

Makes me wonder if we should add --report-bindings by default to the mpirun arguments, and maybe even add a sanity check to the mixin class that checks that the binding is as expected. This would also have helped in detecting the incorrect binding with OpenMPI 5 due to us not (yet) setting the PRTE env vars #297 .

@casparvl
Copy link
Collaborator Author

@laraPPr another thing we could do is simply make this test use the local spawner. I was reluctant since I thought "but then we won't benefit from OpenMPIs binding". But OpenMPI only binds tasks to a core set. We don't need that, since our task runs on all the cores in our allocation (well, unless someone uses --exclusive in their partition config, I guess, in which case it'll have access to more cores than it will launch threads). Anyway, it's probably best to use the local spawner and if we want it to be bound, come up with a hook to do binding for the local spawner (e.g. wrap the command in a numactl <something>). The advantage is that this also works for codes that are not at the foss level (and where we thus cannot use mpirun, but have to use the local spawner). At least this way, all non-MPI codes would use the same binding (none today, but if we introduce the numactl thing, we could achieve some binding later down the line if needed).

Copy link
Collaborator

@laraPPr laraPPr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@laraPPr laraPPr merged commit de02195 into EESSI:main Dec 15, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants