Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rearrange allocation of submodels across nodes #212

Open
aekiss opened this issue Jul 9, 2020 · 3 comments
Open

Rearrange allocation of submodels across nodes #212

aekiss opened this issue Jul 9, 2020 · 3 comments

Comments

@aekiss
Copy link
Contributor

aekiss commented Jul 9, 2020

Back in the May TWG meeting we noted that YATM is allocated to the first PE on the first node, followed by the root PE of MOM on the same node. We discussed whether another placement of accessom2 components on nodes would be preferable for IO etc.

Should the order be CICE, MOM, YATM so the 3 root PEs are on separate nodes?

Is this simply a matter of changing the order of the submodels in config.yaml?

@russfiedler
Copy link

I think that originally (long time ago) that was the order but it got changed for some reason. OASIS related I would guess but maybe YATM relies on being the root PE of MPI_COMM_WORLD. I'd hope not but I suppose it could be tested pretty quickly with the 1 degree config to see if there are any potential problems.

@aekiss
Copy link
Contributor Author

aekiss commented Jul 9, 2020

Unfortunately your hunch was right @russfiedler.

With submodules rearranged as MOM, CICE, YATM it exited with

assertion failed: matmxx does not have global PE == 0
1
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
assertion failed: accessom2_sync_config: Unsupported calendar type
1

with the last 2 lines repeated 240 times (which is the total number of MOM+CICE cpus).
see
/home/156/aek156/payu/testing/all-configs/v2.0.0rc6/1deg_jra55_iaf_v2.0.0rc6_iss212

The first assertion that fails is in accessom2.F90. @nichannah - I guess this is needed for a good reason?

@aekiss
Copy link
Contributor Author

aekiss commented Jul 15, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants