Skip to content

plm/rsh: Add chdir option to change directory before orted exec #7092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 29, 2019

Conversation

sam6258
Copy link
Contributor

@sam6258 sam6258 commented Oct 15, 2019

Currently, there is no mechanism to tell the rsh/ssh launcher to change directories prior to exec'ing the orted. This is normally okay, as the orted will manage the working directory for the ranks of the user's application. This can be seen here:

[c712f6n01] (smiller_rsh_chdir *) [/smpi_dev/smiller/ompi]> $MPI_ROOT/bin/mpirun -host c712f6n02:1 pwd
/smpi_dev/smiller/ompi

However, launching orted inside of certain container technologies presents a new problem where the working directory of the user application can disappear during the container exec phase. This is because some container technologies like singularity mount the current working directory. But when exec'ing the orted from the rsh launcher, the current directory has changed to the user's home directory.

Using an orte_launch_agent, we can get orted to launch inside of a container and illustrate this problem:

[c712f6n01] (smiller_rsh_chdir *) [/smpi_dev/smiller/ompi]> MY_IMG=/smpi_dev/smpi-container-test/singularity/spectrum_mpi-test_10.03.01.00rc03_mofed_4.7-1_cuda_10.1_centos_7.sif 
[c712f6n01] (smiller_rsh_chdir *) [/smpi_dev/smiller/ompi]> $MPI_ROOT/bin/mpirun -mca orte_launch_agent "singularity exec --bind $MPI_ROOT --bind /smpi_prebuilt $MY_IMG $MPI_ROOT/bin/orted" -host c712f6n02:1 pwd
/u/smiller

Our proposed solution is to introduce an mca parameter that allows users to change the working directory after rsh/ssh, but before the exec of orted. This should allow the launch agent to call singularity exec from the correct working directory and mount in that directory for the user application.

[c712f6n01] (smiller_rsh_chdir *) [/smpi_dev/smiller/ompi]> $MPI_ROOT/bin/mpirun -mca plm_rsh_chdir $PWD -mca orte_launch_agent "singularity exec --bind $MPI_ROOT --bind /smpi_prebuilt $MY_IMG $MPI_ROOT/bin/orted" -host c712f6n02:1 pwd
/smpi_dev/smiller/ompi

Signed-off-by: Scott Miller <scott.miller1@ibm.com>
@jjhursey jjhursey self-requested a review October 15, 2019 21:40
@rhc54
Copy link
Contributor

rhc54 commented Oct 15, 2019

sadly, we used to take care of this in the schizo/singularity component that has since been removed ☹️ Personally, I dislike continuing to add burdens on the user. We know how to detect that the user is invoking a container - can we not simply infer that we need to do this behavior instead of making users remember to add it (since they never will remember anyway)?

@jjhursey
Copy link
Member

Unfortunately, not all container runtimes provide a 'tell' (I really wish they did) that the mpirun program is being run from inside of a containerized environment. For Singularity, there is a tell (right now anyway), but for HPC container runtime X there might not be.

@rhc54
Copy link
Contributor

rhc54 commented Oct 15, 2019

okay, so let's use the "tell" when it is available - at least singularity users have a chance of running correctly! Others will have to endure the pain of constantly forgetting to set the param and having to retry their app.

@sam6258
Copy link
Contributor Author

sam6258 commented Oct 16, 2019

@jjhursey I think the problem with using the typical container "tells" (typically if [[ -d /.singularity.d/ ]] for singularity) is that the "tell" comes too late in the flow of events. By the time we know we are in a container, it is too late to mount a new volume for the working directory. Are you talking about the same "tell" I am, or are you referencing something else?

We may be able to leverage some hook mechanism for different container runtimes, but I'm not sure how much work that will be to implement.

@rhc54
Copy link
Contributor

rhc54 commented Oct 16, 2019

I don't know what may have changed, but there used to be a simple "flag" in the container we could detect before anything else happens. I believe that is still the current method used, for example, when layering in external libraries.

@gvallee Can you help us out here? We need to know when mpirun is in a Singularity container.

@jjhursey
Copy link
Member

I think that is a separate conversation from this PR, but may be worth doing as a separate PR.

This PR adds the ability to chdir between the ssh and orted. More commonly a user would probably want to do this if using a launch agent command wrapping the orted - so it is then between the ssh and the launch agent command which wraps the orted. This allows the launch agent command to start from a specific directory. The Singularity example that Scott presented is one case where it is useful to do so.

We could introduce a schizo component for Singularity that is active in the mpirun. It could detect if mpirun is in a Singularity container then introduce the pwd as the argument for this MCA parameter. That would make it easier for those running in a fully contained Singularity environment (one where the mpirun/orted are also in containers. It would require this PR though to give it the necessary functionality.

(For my own reference) Looking at the old schizo/singularity component it doesn't have any discovery that mpirun or orted is running a container since it was (IIRC) designed for the application contained mode (mpirun/orted exist outside of the container). So there is some work to do there to identify the two scenarios.

Should we file an issue for someone to pick up to add Singularity discovery in a schizo component?

@rhc54
Copy link
Contributor

rhc54 commented Oct 16, 2019

I guess it just depends on what problem this PR is attempting to solve. If it is for your multi-node test setup, then it is fine as-is since you know you need to set the param.

However, if it is attempting to resolve a general user problem, then I fear that it will create as much trouble as it solves. Experience shows that people will forget to set the param, and their job will work its way thru the queue and fail to execute. Then they might remember the param (or more likely file an issue with us that someone will have to answer reminding them about the param), resubmit the job and wait again for it to work thru the queue.

I have no issue with this PR - I'm just pointing out that it doesn't solve the general user problem. If that is the objective, then adding infrastructure while kicking the can down the road for someone else to tackle seems rather self-defeating.

@jjhursey
Copy link
Member

My proposal is that we take this PR since it addresses an issue by providing a capability that we didn't have before. Then iterate on a Singularity based discovery solution in a separate ticket/action instead of expanding the scope of this PR for something Singularity specific (that would leverage this change).

@awlauria
Copy link
Contributor

Good to merge this one in? Might be good to create that additional issue @jjhursey and link that discussion here.

@jjhursey
Copy link
Member

Yes. This is good to merge into master. @sam6258 Please PR this to the v4.x branch for consideration there.

I filed Issue #7123 to track the secondary discussion that came from this PR.

@jjhursey jjhursey merged commit 312c55e into open-mpi:master Oct 29, 2019
@sam6258 sam6258 deleted the smiller_rsh_chdir branch October 29, 2019 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants