Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with openmpi stommel example #1281

Closed
selipot opened this issue Jan 16, 2023 · 5 comments
Closed

issue with openmpi stommel example #1281

selipot opened this issue Jan 16, 2023 · 5 comments

Comments

@selipot
Copy link

selipot commented Jan 16, 2023

I am getting the following error
PermissionError: [Errno 13] Permission denied: b'/scratch/msldrift/stommelU.nc'
where /scratch/msldrift/ is my scratch space from where I launched the job on a LSFCluster.

This issue occur when I use more than one processor for the mpirun command as in mpirun -np 2 .... This error does not occur if I use mpirun -np 1 ... but that is not very useful of course. It looks like the multiple processors do not inherit read/write permissions or something like that?

To further debug I am also trying to figure put where in the example_stommel.py code the stommel[U,P,...].nc files are written?

@erikvansebille
Copy link
Member

Hmm, it may be that the two processes want to write to the same file at the same time. Note that the Stommel example first creates flow-fields in netcdf (see here).

Perhaps you could set write_fields to False on line 84 in the example_stommel.py script? Does that help?

I don't think write_fields an argument you can pass when calling the function, maybe we should implement that?

erikvansebille added a commit that referenced this issue Jan 25, 2023
Adding an argument to example_stommel so that fields do not need to be written in test_MPI. This solves #1281
@erikvansebille
Copy link
Member

Hi @selipot, can you confirm that the change in #1291 fixes your issue?

@selipot
Copy link
Author

selipot commented Jan 26, 2023

That change seems to have fixed that one issue with the Sommel example. With the write_fields set to False I can run the Stommel example in parallel on a cluster with 5 nodes as an example. I could not do such think with my previous install, even with the write_fields set to False. So thank you for that!

But unfortunately for me, this change does not help with my own code. I get an error that seems to indicate that the n mpi processes cannot find written files etc such as FileNotFoundError: [Errno 2] No such file or directory: '8.45'. The same code runs fine for n = 1 process. Not sure where to go from here. Just in case I am sharing my code here as a gist.

@erikvansebille
Copy link
Member

Hmm, this is very strange. I don't immediately see anything in particular in your code that would trigger such an error. Could you share the full log of the run? There may be a hint there?

@erikvansebille
Copy link
Member

Closing this issue for now; please reopen if there are new developments to report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants