Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples of bootstrapping under sbatch #90

Open
SteVwonder opened this issue Feb 2, 2021 · 8 comments
Open

Add examples of bootstrapping under sbatch #90

SteVwonder opened this issue Feb 2, 2021 · 8 comments

Comments

@SteVwonder
Copy link
Member

SteVwonder commented Feb 2, 2021

As @Larofeticus pointed out, most (all?) of our examples involve working with Flux in an interactive manner. In particular, they all use salloc to grab a set of nodes and then invoke Flux commands interactively. It would be instructive to have an example where we create a script that bootstraps Flux (and invokes a Flux initial program) and we submit that script with sbatch to show how the whole workflow would work in batch mode.

@dongahn
Copy link
Member

dongahn commented Feb 2, 2021

Perhaps we can repurpose our batch job examples for workflow example section...: https://flux-framework.readthedocs.io/en/latest/batch.html

@SteVwonder
Copy link
Member Author

Thanks @dongahn! That's almost exactly what was needed. As a starter, we should give the batch script in that example a name and include the sbatch ./scriptname.sh line explicitly. That way, if someone searches for sbatch, the example will pop up. @Larofeticus searched for sbatch but the search unfortunately didn't match on the #SBATCH pragma.

@dongahn
Copy link
Member

dongahn commented Feb 4, 2021

@Larofeticus and @SteVwonder:

We discussed this at one of our coffee hours, we can easily add this example, but we would also be very happy to work with the user (@Larofeticus) if an initial PR is proposed based on the existing examples at https://flux-framework.readthedocs.io/en/latest/batch.html

My opinion is that users know the best what they want to see from a page then a developer.

Let me know what you think.

@Larofeticus
Copy link

I'm happy to help the process here.

I suppose the first specific thing i've found is that --mpibind=none is not a valid srun flag on Cori.

The generalized form of that is a gentle reminder to avoid including site specific features in the documentation. Balsam also had (but much more) trouble with this: Being tightly coupled to Cobalt and specific Argonne machine configurations.

@dongahn
Copy link
Member

dongahn commented Feb 5, 2021

I'm happy to help the process here.

Great!

I suppose the first specific thing i've found is that --mpibind=none is not a valid srun flag on Cori.

Ah... yes that is a Livermore specific flag!

Maybe to help get your feet wet: if you propose a small PR that includes your proposed fix for a single topic (i.e., site specificity note for mpibind=none), we will review and help that to be merged.

@Larofeticus
Copy link

One piece of information I don't have is: "What is the consequence of removing that flag when using a system that does have mpibind?" Does the example still work as intended?

@dongahn
Copy link
Member

dongahn commented Feb 5, 2021

If the site uses other ways to bind Flux brokers to a subset of cores/gpus, no the examples won't work. In that case, Flux will only schedule the subset of resources.

Does NERSC have any srun option to ensure binding is not happening? Or srun is by default not binding at all. If latter, Flux should work out of box without mpibind=none.

@grondo
Copy link
Contributor

grondo commented Feb 5, 2021

Suggestion: As part of the docs include a "sanity check" command that a new user can run to verify that Flux has discovered all expected resources (e.g. run flux resource list, if it appears that Flux has not discovered all expected resources, then it may be that the native launcher on your system has restricted the resources available to the flux broker processes. Check for site-specific options such as --mpibind and be sure to disable them)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants