-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add examples of bootstrapping under sbatch #90
Comments
Perhaps we can repurpose our batch job examples for workflow example section...: https://flux-framework.readthedocs.io/en/latest/batch.html |
Thanks @dongahn! That's almost exactly what was needed. As a starter, we should give the batch script in that example a name and include the |
@Larofeticus and @SteVwonder: We discussed this at one of our coffee hours, we can easily add this example, but we would also be very happy to work with the user (@Larofeticus) if an initial PR is proposed based on the existing examples at https://flux-framework.readthedocs.io/en/latest/batch.html My opinion is that users know the best what they want to see from a page then a developer. Let me know what you think. |
I'm happy to help the process here. I suppose the first specific thing i've found is that The generalized form of that is a gentle reminder to avoid including site specific features in the documentation. Balsam also had (but much more) trouble with this: Being tightly coupled to Cobalt and specific Argonne machine configurations. |
Great!
Ah... yes that is a Livermore specific flag! Maybe to help get your feet wet: if you propose a small PR that includes your proposed fix for a single topic (i.e., site specificity note for |
One piece of information I don't have is: "What is the consequence of removing that flag when using a system that does have mpibind?" Does the example still work as intended? |
If the site uses other ways to bind Flux brokers to a subset of cores/gpus, no the examples won't work. In that case, Flux will only schedule the subset of resources. Does NERSC have any srun option to ensure binding is not happening? Or srun is by default not binding at all. If latter, Flux should work out of box without |
Suggestion: As part of the docs include a "sanity check" command that a new user can run to verify that Flux has discovered all expected resources (e.g. run |
As @Larofeticus pointed out, most (all?) of our examples involve working with Flux in an interactive manner. In particular, they all use
salloc
to grab a set of nodes and then invoke Flux commands interactively. It would be instructive to have an example where we create a script that bootstraps Flux (and invokes a Flux initial program) and we submit that script withsbatch
to show how the whole workflow would work in batch mode.The text was updated successfully, but these errors were encountered: