!!!ALL the implementation relies on src/QuIDS found at jolatechno/QuIDS!!!
Scaling results are in the jolatechno/Quantum-graph-simulation-plots sub-repository.
In any directory with compilable code, you can compile the code using the make command (the targets of make are the main file name without any extensions).
To compile for MPI (which is requiered for almost all file) CXX=mpic++ should be added to the make command to link MPI binary library.
The "ping pong" test is the test later used for validation, it consists in applying N iterations, then N reversed iteration to hopefully obtain back the starting state.
The src/test/ping_pong_test.ccp file (later used for injectivity test) simply print the intial state, and after running the simulation (including the inverse transformation) will print the final state. If you don't wat to apply the reverse transformation you can simply pass reversed_n_iters=0.
Compilation is done using make ping_pong_test, you can run it,
The src/test/mpi_ping_pong_test.ccp file has the same function as src/test/ping_pong_test.ccp, but support MPI. Since it gather the final state on the main rank at the end before printing it, if their is too much object by this time to fit in memory of a single node, the program will crash.
Compilation is done using make CXX=mpic++ mpi_ping_pong_test.
The compilable file used to obtain scaling results is src/omp_mpi_scaling_test/scaling_test.cpp. After compiling it using make CXX=mpic++, actual scaling results are obtained using the src/omp_mpi_scaling_test/scaling_test.sh script, which get passed the following arguments:
-h: simple help infos.-f: compiled file used, default isscaling_test.out.-a: argument passed toscaling_test.out, in the form described above (n_iter|graph_size|rules). Note thatreversed_n_iteris used set the iteration at which we start measuring performance (to let the program generate enough graphs at first). Default is0.-t: list of the number of threads to test (ex:1,2,6the default is the number_of available threads).-n: list of the number of mpi rank to spawn per node (ex:6,3,1default is1).-m: additionals arguments formpirun.-N: total number of nodes used for MPI, default is1.
Note that the output (after separating stderr from stdout) will be formated as a json.
To obtain scaling results for different number of nodes, using slurm src/omp_mpi_scaling_test/mpi_scaling.sh is used (which simply calls src/omp_mpi_scaling_test/scaling_test.sh script, and stores the results in src/omp_mpi_scaling_test/tmp). It get passed the following arguments:
-h: simple help infos.-a: argument passed toscaling_test.out, similar to-aforscaling_test.sh.reversed_n_iteris also used set the iteration at which we start measuring performance. Default is0.-M: comma-separated list of modules to be loaded.-u: if true, usesmpirunthrough scaling_test.sh, otherwise usesslurm.-f,-t,-nand-m: same as scaling_test.sh.-N: list of number of nodes to ask from sbatch (example1,2,4default is1).-s: additional arguments to pass to sbatch (to ask for specific nodes for example).-o: base name of the output files (default isout_, so the results for n ranks will be res_n.out and res_n.err).
The src/omp_mpi_scaling_test/csv-from-tmp.py script (requiering python 3) simply takes a base name (-o argument of src/omp_mpi_scaling_test/mpi_scaling.sh) and returns a csv formated compilation of the results obtained by using src/omp_mpi_scaling_test/mpi_scaling.sh.
Similarly, src/omp_mpi_scaling_test/mem-csv-from-file.py is used to process the memory usage evolution of a single file into a csv format. The argument provided should simply be the name of the file ("-o" base name, number of node, and .out extension).
Injectivity testing correspond to running the ping_pong_test on a variety of starting graph state/size and rules to insure that the state after the ping-pong is the same as the starting state.
Injectivity testing is done using src/test/injectivity_test.sh script (relying on src/test/ping_pong_test.ccp which should be compiled using make ping_pong_test). It takes the following arguments (not detailing other less usefull debuging flags):
-h: show help infos.-v: show verbose.-n: number of iteration, default is4.-s: minimum graph size to test, default is1.-S: maximum graph size to test, default is5.-r: minimum random seed to test, default is0.-R: minimum random seed to test, default is100.
Injectivity testing for multiple graphs is done using src/test/mpi_injectivity_tst.sh script (relying on src/test/mpi_ping_pong_test.ccp which should be compiled using make CXX=mpic++ mpi_ping_pong_test). It takes the following arguments (not detailing other less usefull debuging flags):
-h: show help infos.-v: show verbose.-n,-s,-S,-rand-R: same asinjectivity_test.sh, but with different default value (detailed in the-hmenu).-t: number of thread per rank, default is1.-p: number of rank per node, default is1.
All compiled file get passed a special argument describing exactly the simulation. It is formated as:
n_iter,option1=x,option2=y...|graph_size1,_graph_option1=x,...;graph_size2,...|rule1,rule1_option1=x...;rule2,...
where n_iter and graph_size are integers describing respectivly the number of iterations and the initial graph size of the simulation.
The options are:
seed: the random seed used to generate random objects. If not given, selected as random.reversed_n_iters: number of iteration to do with the inverse transformation (only used in certain files, for injectivity testing, default is n_iter when used).max_num_object: representing the maximum number of object to keep per shared memory node.0represents auto-truncation (keeping the maximum number of graph within memory limits),-1represent no truncation (can cause crashes when running out of memory). The default is0.safety_margin: representingquids::safety_margin(see the Readme from jolatechno/QuIDS).tolerance: representingquids::tolerance.simple_truncation: representingquids::simple_truncation.load_balancing_bucket_per_thread: representingquids::load_balancing_bucket_per_thread.align: representsquids::align_byte_length(the amount to which objects should be alligned)min_equalize_size: representingquids::mpi::min_equalize_size(only interpreted when MPI is used).equalize_inbalance: representingquids::mpi::equalize_inbalance(only interpreted when MPI is used).
The graph_options are used to parametrize the initial state. They are as follow:
n_graphs: reprents the number of graph with a given set of option. Default is1.real: represent the real part of the magnitude shared by all graphs with a given set of option.imag: represent the imaginary part of the magnitude shared by all graphs with a given set of option.
IMPORTANT: Note that the initial state is normalized after generation, so the magnitudes don't have to add up to 1.
Implemented rules (all described in further details in TO LINK) are:
step: a simplequids::modifiermoving all particles in the same direction as their orientation.reversed_step: inverse transformation ofstep.coin: flip particle going left and right locally.erase_create: exchange an empty node and a full node locally.split_merge: exchange a full node and two nodes locally. Can create and destroy nodes.
For coin, erase_create and split_merge, the rule_options are:
theta:thetain the unitarty matrix, increasing the probability of interaction. Represented as a ratio of Pi, default is0.25.phi:phiin the unitary matrix, a phase between the diagonal elements and non-diagonal terms. Represented as a ratio of Pi, default is0.xi:xiin the unitary matrix, a phase between the two diagonal elements. Represented as a ratio of Pi, default is1.n_iter: number of application of the rule before switching to the next rule/iteration. Default is1.
So to simulate 2 iteration of step followed by erase_create with a single starting graph with 12 nodes, and with a safety margin of 0.5, the following argument is passed:
...command... "2,safety_margin=0.5|12|step;erase_create"To simulate 2 iteration of step followed by split_merge with one starting graph with 12 nodes and 2 starting graphs with 14 nodes and a pure imaginary magnitutde, the following argument is passed:
...command... "2|12;14,n_graphs=2,imag=1,real=0|step;split_merge"To simulate 2 iteration, starting with a single graph of size 12, and applying step two times followed by coin with theta=0.125, the following argument is passed:
...command... "2|12|step,n_iter=2;split_merge,theta=0.125"To run a ping pong test for 4 iterations of step followed by split_merge starting with a single graph of size 12 using the following command:
./pin_pong_test.out "4,reversed_n_iters=0|12|step;split_merge"To run the ping pong test with 8 processes (see mpirun(1) man page for more info on mpirun), for 4 iterations of step followed by split_merge starting with a single graph of size 12 using the following command:
mpirun -n 8 mpi_ping_pong_test.out "4,reversed_n_iters=0|12|step;split_merge"To test the scaling on 5 nodes for 1 rank times 6 threads, 3 ranks time 2 threads, and 6 ranks times 1 threads for 3 iteration of step followed by erase_create starting with a single graph of 12 nodes, the command will be:
./scaling_test.sh -N 5 -t 6,2,1 -n 1,3,6 -a "4,reversed_n_iters=0|12|step;split_merge"To test the scaling on 1,2,4 and 6 nodes for 1 rank times 6 threads, 3 ranks time 2 threads, and 6 ranks times 1 threads for 3 iteration of step followed by erase_create starting with a single graph of 12 nodes, the command will be:
./mpi_scaling.sh -N 1,2,4,6 -t 6,2,1 -n 1,3,6 -a "4,reversed_n_iters=0|12|step;split_merge"Actual QCGDs simulations are supported (only using MPI) by the src/simulation/quantum_iterations.cpp file, which can be compiled using make CXX=mpic++, and simply takes the same arguments as src/test/mpi_ping_pong_test.ccp or src/test/mpi_ping_pong_test.ccp, and prints out a json-formated list of the average values after each application of a rule. For example (for 8 processes):
mpirun -n 8 quantum_iterations.out "4|12|step;split_merge"We mainly used two clusters:
The following commands
# ---------------------------
# ---------------------------
# simple stability and validations test from here
# not included in the paper
# ---------------------------
# ---------------------------
# ---------------------------
# other demanding stability tests
# ---------------------------
# single node multi-rule stability test (still inside src/omp_mpi_scaling_test)
./mpi_scaling.sh -u -n 64 -t 1 \
-f zonda_scaling_test.out \
-M compiler/gcc/11.2.0,mpi/openmpi/4.0.1 \
-s " -J test_birule -C zonda --exclusive --time=0-2:00" \
-a "5,seed=0|15|step;erase_create;step;split_merge" -o test_birule_
# multi-node stability test (still inside src/omp_mpi_scaling_test)
./mpi_scaling.sh -u -N 2,4,8,16 \
-n 36 -t 1 \
-f bora_scaling_test.out \
-M compiler/gcc/11.2.0,mpi/openmpi/4.0.1 \
-m "--mca mtl psm2" \
-s "-C bora --exclusive -J ec_long --time=0-2:00" \
-a "10,seed=0|30|step;erase_create" -o test_very_long_ec_
./mpi_scaling.sh -u -N 2,4,8,16 \
-n 36 -t 1 \
-f bora_scaling_test.out \
-M compiler/gcc/11.2.0,mpi/openmpi/4.0.1 \
-m "--mca mtl psm2" \
-s "-C bora --exclusive -J sm_long --time=0-2:00" \
-a "20,seed=0|30|step;split_merge" -o test_long_sm_
# multi-bode multi-rule stability test (still inside src/omp_mpi_scaling_test)
./mpi_scaling.sh -u \
-N 2,4,8,16 \
-n 36 -t 1 \
-f bora_scaling_test.out \
-M compiler/gcc/11.2.0,mpi/openmpi/4.0.1 \
-m "--mca mtl psm2" \
-s "-C bora --exclusive -J test_bi_rule --time=0-2:00" \
-a "10,seed=0|20|step;erase_create;step;split_merge" -o test_birule_
# ---------------------------
# injectivity tests
# ---------------------------
cd src/test
# compilation (in src/test)
module purge
module load compiler/gcc/11.2.0
module load mpi/openmpi/4.0.1
make CXX=mpic++
# simple single node openmp injectivity test (still in src/test)
./injectivity_test.sh -v
# simple single node mpi injectivty test (still in src/test) with NUM_THREAD threads per task and NUM_TASK tasks (with NUM_THREAD*NUM_TASK <= #CPU Cores)
./mpi_injectivity_test.sh -v -t NUM_THREAD -p NUM_TASK
# multi-node mpi injectivity test
MODULES=compiler/gcc/11.2.0,mpi/openmpi/4.0.1 n_per_node=36 n_threads=1 args="-v -s 6 -S 14 -n 8" sbatch -N 10 -C bora slurm.sh
# ---------------------------
# ---------------------------
# commands offten used when running experiments
# ---------------------------
# ---------------------------
# to clear slurm queue
squeue -u $USER | awk '{print $1}' | tail -n+2 | xargs scancel
squeue -u $USER | grep "14 (" | awk '{print $1}' | xargs scancel
squeue -u $USER | grep "QOSMaxCpuPerUserLimit" | awk '{print $1}' | xargs scancel# ---------------------------
# ---------------------------
# command to replicate results on ruche
# '-> cluster info page: https://mesocentre.pages.centralesupelec.fr/user_doc/ruche/01_cluster_overview/
# ---------------------------
# ---------------------------
# ---------------------------
# compile
# ---------------------------
module purge
module load gcc/11.2.0/gcc-4.8.5
#module load mpich/3.3.2/gcc-9.2.0
module load intel-mpi/2019.9.304/intel-20.0.4.304
#module load openmpi/4.1.1/gcc-11.2.0
make CFLAGS="-march=cascadelake -DSAFETY_MARGIN=0.15 -DEQUALIZE_FACTOR=0.25" CXX=mpic++ #CXX=mpicxx
# ---------------------------
# scaling tests
# ---------------------------
# multi-node strong scaling (still inside src/omp_mpi_scaling_test)
./mpi_scaling.sh -N 1,2,4,6,8,10,14,16,20,24 \
-n 40 -t 1 -G 900000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-m "--mpi=pmi2" -s "-p cpu_short,cpu_med,cpu_prod,cpu_scale --exclusive -J s_ec_s --time=0-01:00" \
-a "15,reversed_n_iter=10,seed=0|16|step;erase_create" -o strong_ec_short_
./mpi_scaling.sh -N 26,30,34,36,40,44,46,50 \
-n 40 -t 1 -G 900000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_prod,cpu_scale --exclusive -J s_ec_s --time=0-01:00" \
-a "15,reversed_n_iter=10,seed=0|16|step;erase_create" -o strong_ec_short_
./mpi_scaling.sh -N 55,60,65,60,65,70,75,80,85,90,95,100 \
-n 40 -t 1 -G 900000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_scale --exclusive -J s_ec_s --time=0-01:00" \
-a "15,reversed_n_iter=10,seed=0|16|step;erase_create" -o strong_ec_short_
./mpi_scaling.sh -N 20 \
-n 1,2,5,10,20,40 -t 1,1,1,1,1,1 -G 18000000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-m "--mpi=pmi2" -s "-p cpu_short,cpu_med,cpu_prod,cpu_scale --exclusive -J s_ec_l --time=0-01:00" \
-a "15,reversed_n_iter=10,seed=0|16|step;erase_create" -o strong_ec_long_
./mpi_scaling.sh -N 24,26,30,34,36,40,44,46,50 \
-n 40 -t 1 -G 18000000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_prod,cpu_scale --exclusive -J s_ec_l --time=0-01:00" \
-a "15,reversed_n_iter=10,seed=0|16|step;erase_create" -o strong_ec_long_
./mpi_scaling.sh -N 55,60,65,60,65,70,75,80,85,90,95,100 \
-n 40 -t 1 -G 18000000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_scale --exclusive -J s_ec_l --time=0-01:00" \
-a "15,reversed_n_iter=10,seed=0|16|step;erase_create" -o strong_ec_long_
# multi-node strong scaling (still inside src/omp_mpi_scaling_test)
./mpi_scaling.sh -N 1,2,4,6,8,10,14,16,20,24 \
-n 40 -t 1 -G 20000000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-m "--mpi=pmi2" -s "-p cpu_short,cpu_med,cpu_prod,cpu_scale --exclusive -J s_sm_s --time=0-01:00" \
-a "16,reversed_n_iter=10,seed=0|15|step;split_merge" -o strong_sm_short_
./mpi_scaling.sh -N 26,30,34,36,40,44,46,50 \
-n 40 -t 1 -G 20000000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_prod,cpu_scale --exclusive -J s_sm_s --time=0-01:00" \
-a "16,reversed_n_iter=10,seed=0|15|step;split_merge" -o strong_sm_short_
./mpi_scaling.sh -N 55,60,65,60,65,70,75,80,85,90,95,100 \
-n 40 -t 1 -G 20000000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_scale --exclusive -J s_sm_s --time=0-01:00" \
-a "16,reversed_n_iter=10,seed=0|15|step;split_merge" -o strong_sm_short_
./mpi_scaling.sh -N 20 \
-n 1,2,5,10,20,40 -t 1,1,1,1,1,1 -G 400000000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-m "--mpi=pmi2" -s "-p cpu_short,cpu_med,cpu_prod,cpu_scale --exclusive -J s_sm_l --time=0-01:00" \
-a "16,reversed_n_iter=10,seed=0|15|step;split_merge" -o strong_sm_long_
./mpi_scaling.sh -N 24,26,30,34,36,40,44,46,50 \
-n 40 -t 1 -G 400000000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_prod,cpu_scale --exclusive -J s_sm_l --time=0-01:00" \
-a "16,reversed_n_iter=10,seed=0|15|step;split_merge" -o strong_sm_long_
./mpi_scaling.sh -N 55,60,65,60,65,70,75,80,85,90,95,100 \
-n 40 -t 1 -G 400000000 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_scale --exclusive -J s_sm_l --time=0-01:00" \
-a "16,reversed_n_iter=10,seed=0|15|step;split_merge" -o strong_sm_long_
# multi-node weak scaling (still inside src/omp_mpi_scaling_test)
./mpi_scaling.sh -N 1,2,4,6,8,10,14,16,20,24 \
-n 40 -t 1 -G 0 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-m "--mpi=pmi2" -s "-p cpu_short,cpu_med,cpu_prod,cpu_scale --exclusive -J w_ec --time=0-01:00" \
-a "12,reversed_n_iter=7,seed=0|18|step;erase_create" -o weak_ec_
./mpi_scaling.sh -N 26,30,34,36,40,44,46,50 \
-n 40 -t 1 -G 0 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_prod,cpu_scale --exclusive -J w_ec --time=0-01:00" \
-a "12,reversed_n_iter=7,seed=0|17|step;erase_create" -o weak_ec_
./mpi_scaling.sh -N 55,60,65,60,65,70,75,80,85,90,95,100 \
-n 40 -t 1 -G 0 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_scale --exclusive -J w_ec --time=0-01:00" \
-a "12,reversed_n_iter=7,seed=0|17|step;erase_create" -o weak_ec_
# multi-node weak scaling (still inside src/omp_mpi_scaling_test)
./mpi_scaling.sh -N 1,2,4,6,8,10,14,16,20,24 \
-n 40 -t 1 -G 0 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-m "--mpi=pmi2" -s "-p cpu_short,cpu_med,cpu_prod,cpu_scale --exclusive -J w_sm --time=0-01:00" \
-a "16,reversed_n_iter=10,seed=0|15|step;split_merge" -o weak_sm_
./mpi_scaling.sh -N 26,30,34,36,40,44,46,50 \
-n 40 -t 1 -G 0 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_prod,cpu_scale --exclusive -J w_sm --time=0-01:00" \
-a "16,reversed_n_iter=10,seed=0|15|step;split_merge" -o weak_sm_
./mpi_scaling.sh -N 55,60,65,60,65,70,75,80,85,90,95,100 \
-n 40 -t 1 -G 0 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-s "-p cpu_scale --exclusive -J w_sm --time=0-01:00" \
-a "16,reversed_n_iter=10,seed=0|15|step;split_merge" -o weak_sm_
# get results from multi-node (still inside src/omp_mpi_scaling_test)
./combine_output_scaling_test.py strong_ec_short_
./combine_output_scaling_test.py strong_ec_long_
./combine_output_scaling_test.py strong_sm_short_
./combine_output_scaling_test.py strong_sm_long_
./combine_output_scaling_test.py weak_ec_
./combine_output_scaling_test.py weak_sm_
# commands to plot (inside of Quantum-graph-simulation-plots/python_post_processing):
./plot_accuracy_weak_scaling.py ../data/weak_sm_combined.json.out scaling/weak_scaling/accuracy_sm.png
./plot_accuracy_weak_scaling.py ../data/weak_ec_combined.json.out scaling/weak_scaling/accuracy_ec.png
./plot_proportions.py ../data/weak_sm_combined.json.out output=scaling/weak_scaling/proportions_sm.png
./plot_proportions.py ../data/weak_ec_combined.json.out output=scaling/weak_scaling/proportions_ec.png
./plot_weak_scaling.py ../data/weak_sm_combined.json.out scaling/weak_scaling/weak_scaling_sm.png
./plot_weak_scaling.py ../data/weak_ec_combined.json.out scaling/weak_scaling/weak_scaling_ec.png
#./plot_proportions.py ../data/strong_sm_short_combined.json.out ../data/strong_sm_long_combined.json.out output=scaling/strong_scaling/proportions_combined_sm.png
#./plot_proportions.py ../data/strong_ec_short_combined.json.out ../data/strong_ec_long_combined.json.out output=scaling/strong_scaling/proportions_combined_ec.png
#./plot_strong_scaling.py ../data/strong_sm_short_combined.json.out ../data/strong_sm_long_combined.json.out output=scaling/strong_scaling/strong_scaling_combined_sm.png
#./plot_strong_scaling.py ../data/strong_ec_short_combined.json.out ../data/strong_ec_long_combined.json.out output=scaling/strong_scaling/strong_scaling_combined_ec.png
./plot_proportions.py ../data/strong_sm_short_combined.json.out output=scaling/strong_scaling/proportions_short_sm.png
./plot_proportions.py ../data/strong_sm_long_combined.json.out output=scaling/strong_scaling/proportions_long_sm.png
./plot_proportions.py ../data/strong_ec_short_combined.json.out output=scaling/strong_scaling/proportions_short_ec.png
./plot_proportions.py ../data/strong_ec_long_combined.json.out output=scaling/strong_scaling/proportions_long_ec.png
./plot_strong_scaling.py ../data/strong_sm_short_combined.json.out output=scaling/strong_scaling/strong_scaling_short_sm.png
./plot_strong_scaling.py ../data/strong_sm_long_combined.json.out output=scaling/strong_scaling/strong_scaling_long_sm.png
./plot_strong_scaling.py ../data/strong_ec_short_combined.json.out output=scaling/strong_scaling/strong_scaling_short_ec.png
./plot_strong_scaling.py ../data/strong_ec_long_combined.json.out output=scaling/strong_scaling/strong_scaling_long_ec.png
# ---------------------------
# memory usage test
# used for Fig 7 in the paper
# ---------------------------
./mpi_scaling.sh -N 10 \
-n 40 -t 1 -G 0 -f scaling_test.out \
-M gcc/11.2.0/gcc-4.8.5,intel-mpi/2019.9.304/intel-20.0.4.304 \
-m "--mpi=pmi2" -s "-p cpu_short,cpu_med,cpu_prod,cpu_scale --exclusive -J mem --time=0-01:00" \
-a "17,reversed_n_iter=0,seed=0|17|step;erase_create" -o memory_test_
./combine_output_scaling_test.py memory_test_
./plot_memory_usage.py ../data/memory_test_combined.json.out memory_usage/memory_usage.png