-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling issue with sample parallel/rectangular_cross test case #27
Comments
@samcom12 thanks for looking at the I have been playing with that script so that I could easily change the mesh size among other parameters at the command line. The default at the moment is running with just 40000 triangles. I think (at least at the moment) we can't expect any speed up after the sub domains become smaller than say1000 triangles. You can run with a large mesh by using the --sqrtN argument. For instance
should run with 1_000_000 triangles (500 x 500 x 4). I tested up to 9_000_000 triangles with python 2 with pypar and obtained the same behaviour as the current python 3 with mpi4py. 50% scalability when I am investigating overlapping communication with computation to see if we can improve the scalability. I am surprised that the current results are not as good as the results we saw in old study. I'm wondering if that was on a machine that had much faster communication infrastructure relative to computation. |
@samcom12 here is a plot of my scalability experiments on our NCI gadi system using Here is the data on which this is based:
|
Thanks @stoiver , It shows good scaling with your machine. System Configuration- Installation steps-
|
And SLURM script-
|
A description of the NCI Gadi system can be found here. In particular the above results were run on standard nodes with the following characteristics:
|
HI @stoiver , We see a everse scaling with OpenMPI-4.1.4 version. |
@samcom12 Here are the NCI pre-compiled modules I use:
The recommendation from our NCI technical staff is to use I note that you are using |
@samcom12, I thought the problem for you might have been using the binary I have also rerun the experiments with my standard installation but this time ensuring each run is run from a separate folder. This seems to have improved the scalability results. Perhaps there are subtle conflicts when simultaneously running multiple simulations from the same folder? |
Hello @stoiver , Greetings to you! The latest commit breaks the Just FYR. cheers, |
@samcom12 , fixed the Was created by our automated conversion from Needed to change
to
|
@samcom12, just an update that I have changed the naming convention for the |
Thanks @stoiver !! As an other experiment we were trying to run -
on 10 cascade lake nodes. Do you see with above combination if internal code breaks cold happen? cheers, |
@samcom12 , my guess is that with |
Hi,
I tried benchmarking latest
run_parallel rectangular_cross.py
script.It seems the EVOLVE loop not scaling at all while increasing number of processes.
rectangular_test_default_param_46_MPI.txt
rectangular_test_default_param_460_MPI.txt
The text was updated successfully, but these errors were encountered: