Skip to content

Defect: Performance issue on multiple nodes #560

Closed
@gutmann

Description

@gutmann

Defect/Bug Report

Performance issue on multiple nodes.
I thought I'd add a comment here on a performance issue I'm seeing at the moment using coarrays in derived types. I haven't chased down anything more specific, but using the coarray-icar test-ideal, and a small (200 x 200 x 20) problem size, I'm seeing huge slow downs across multiple nodes. This was not present in opencoarrays 1.9.1 (with gcc 6.3) and it is not present with intel.

I initially thought this could be related to #556 but now think this is completely separate since it is internode communication and thus will require MPI calls.

Coarray_icar uses non-blocking sends on allocatable coarrays embedded within derived types. It first processes halo grid cells, then sends them, then processes internal grid cells, then syncs with it's neighbors before reading the halos that were sent to it.

  • OpenCoarrays Version:
  • Fortran Compiler: gfortran 8.1
  • C compiler used for building lib: gcc 8.1
  • Installation method: install.sh
  • Output of uname -a:Linux cheyenne1 3.12.62-60.64.8-default #1 SMP Tue Oct 18 12:21:38 UTC 2016 (42e0a66) x86_64 x86_64 x86_64 GNU/Linux
  • MPI library being used: MPICH 3.2.1
  • Machine architecture and number of physical cores: SGI Xeons+infiniband 36 cores / node cheyenne
  • Version of CMake: 3.9.1

Observed Behavior

Performance gets worse when multiple nodes are used

Expected Behavior

Performance gets better when multiple nodes are used

Steps to Reproduce

git clone https://github.com/gutmann/coarray_icar
cd coarray_icar/src/tests
make MODE=fast

# edit input_parameters
cat > input-parameters.txt <<EOF
&grid nx=200,ny=200,nz=20 /
EOF

# get access to multiple nodes
for (( i=36; i<=144; i+=36 )); do
    cafrun -n $i ./test-ideal | grep "Model run time"
done

Example results for Opencoarrays 2.1, 1.9.1 and intel (more details on each below)

Images OpenCoarrays 2.1 OpenCoarrays 1.9.1 Intel 18.0.1
36 14.7 16.3 11.3
72 105 8.9 5.8
144 140 4.6 3.2
720 170 1.4 0.94

All times in seconds. This is just the core runtime of the numerics not any of the initialization time.

OpenCoarrays 2.1

gfortran/gcc 8.1 (built via opencoarrays install.sh)
mpich 3.2.1 (built via opencoarrays install.sh)
opencoarrays 2.1.0-31-gc0e3ffb (with fprintf statements commented out in mpi/mpi.c)

OpenCoarrays 1.9.1

gfortran/gcc 6.3
MPT 2.15f
opencoarrays 1.9.1

Intel 18.0.1

ifort 18.0.1*
iMPI 2018.1.163

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions