Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure performance of LILAC's redistribution at various processor counts #895

Open
billsacks opened this issue Jan 23, 2020 · 1 comment
Labels
blocked: dependency Wait to work on this until dependency is resolved investigation Needs to be verified and more investigation into what's going on.

Comments

@billsacks
Copy link
Member

We should measure the performance of LILAC's data redistribution (between the atmosphere and land decompositions) at various processor counts. We should determine the time it takes both to do this redistribution, and (if possible) determine the slowdown caused by introducing this global synchronization point.

(@gutmann raised this point yesterday; it is something we've talked about doing ever since the initial LILAC proposal, but I wanted to open an issue to make sure it gets done.)

It will probably be easier to do this after #894 is resolved: Blocked by #894 .

@billsacks billsacks added investigation Needs to be verified and more investigation into what's going on. blocked: dependency Wait to work on this until dependency is resolved labels Jan 23, 2020
@billsacks
Copy link
Member Author

I did a very unscientific look at this issue in the course of investigating a different performance issue. With a 2-day CONUS run on 72 processors (2 nodes) on cheyenne, I found:

  • If I bypassed LILAC and CTSM entirely (so didn't run any land code in WRF), runtime was about 3 min 7 sec per simulated day
  • If I did all of the LILAC stuff (data redistribution between decompositions, etc.) but returned from CTSM immediately, runtime was about 3 min 10 sec per simulated day

This difference (1.6%) is probably within machine variability (I only did a single run of each case), but at least indicates that, at this relatively low processor count, the time taken by LILAC's redistribution is relatively small, and is roughly in line with my gut-level expectations based on coupler timings in CESM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked: dependency Wait to work on this until dependency is resolved investigation Needs to be verified and more investigation into what's going on.
Projects
None yet
Development

No branches or pull requests

1 participant