Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ClimaCore 0.11 and MPITrampoline #2377

Merged
merged 5 commits into from
Nov 23, 2023
Merged

Use ClimaCore 0.11 and MPITrampoline #2377

merged 5 commits into from
Nov 23, 2023

Conversation

Sbozzolo
Copy link
Member

No description provided.

@Sbozzolo
Copy link
Member Author

Closes #2138

@Sbozzolo Sbozzolo force-pushed the gb/up_deps branch 3 times, most recently from f2f8e3a to c331a45 Compare November 22, 2023 18:46
@Sbozzolo
Copy link
Member Author

I am trying to get MPI to work.

I managed to have it working here: https://buildkite.com/clima/climaatmos-ci/builds/15017

Then, I pushed again the same commit, and it is failing:
https://buildkite.com/clima/climaatmos-ci/builds/15020

The main difference is that in one case the depot was cold, so everything had to be compiled.

@Sbozzolo Sbozzolo force-pushed the gb/up_deps branch 12 times, most recently from c153f2a to c8994c0 Compare November 22, 2023 22:06
@Sbozzolo
Copy link
Member Author

Sbozzolo commented Nov 22, 2023

In this build:

https://buildkite.com/clima/climaatmos-ci/builds/15043#018bf91f-8c1f-4f82-8a8e-a5b4cd410de8

CA.solve_atmos!(CA.get_integrator(CA.AtmosConfig()) run on 2 processes, but not the MPI example

      - label: "ClimaAtmos"
        command: "julia --color=yes --project=examples -e 'import ClimaAtmos; ClimaAtmos.solve_atmos!(ClimaAtmos.get_integrator(ClimaAtmos.AtmosConfig()))'"
        agents:
          slurm_ntasks: 2
          slurm_mem: 16G
        env:
          CLIMACOMMS_CONTEXT: "MPI"

@szy21
Copy link
Member

szy21 commented Nov 22, 2023

Maybe change the CLIMACORE_DISTRIBUTED: "MPI" in MPI example to CLIMACOMMS_CONTEXT: "MPI"?

@Sbozzolo
Copy link
Member Author

Maybe change the CLIMACORE_DISTRIBUTED: "MPI" in MPI example to CLIMACOMMS_CONTEXT: "MPI"?

Failing https://buildkite.com/clima/climaatmos-ci/builds/15045

@simonbyrne
Copy link
Member

For some reason it is using the wrong preferences:
https://buildkite.com/clima/climaatmos-ci/builds/15080#018bfd3b-6aed-4bf1-8208-d1fecdc8dc1b/160-167

@Sbozzolo
Copy link
Member Author

For some reason it is using the wrong preferences: https://buildkite.com/clima/climaatmos-ci/builds/15080#018bfd3b-6aed-4bf1-8208-d1fecdc8dc1b/160-167

Thanks for the lead, let me investigate this.

@simonbyrne
Copy link
Member

Ah, I think it's because it's picking up the preferences from the shared depot (which was set previously).

├ cat /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/environments/v1.9/LocalPreferences.toml
[MPIPreferences]
__clear__ = ["preloads_env_switch"]
_format = "1.0"
abi = "OpenMPI"
binary = "system"
cclibs = []
libmpi = "libmpi"
mpiexec = "mpiexec"
preloads = []

either delete that file, or just disable the shared depot for now.

@Sbozzolo
Copy link
Member Author

Ah, I think it's because it's picking up the preferences from the shared depot (which was set previously).

├ cat /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/environments/v1.9/LocalPreferences.toml
[MPIPreferences]
__clear__ = ["preloads_env_switch"]
_format = "1.0"
abi = "OpenMPI"
binary = "system"
cclibs = []
libmpi = "libmpi"
mpiexec = "mpiexec"
preloads = []

either delete that file, or just disable the shared depot for now.

Yes, I just arrived at the same conclusion, which also explains why it was working when the depot was empty.

I'll clean up this PR, clean the depot, and try again.

@Sbozzolo Sbozzolo force-pushed the gb/up_deps branch 3 times, most recently from 5b0148a to d086a04 Compare November 23, 2023 18:42
@Sbozzolo Sbozzolo added this pull request to the merge queue Nov 23, 2023
Merged via the queue into main with commit 4a55565 Nov 23, 2023
10 checks passed
@Sbozzolo Sbozzolo deleted the gb/up_deps branch November 23, 2023 21:50
@Sbozzolo Sbozzolo mentioned this pull request Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants