-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tutorials: add flux mini alloc/batch tutorial #208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments below, and preview for others! https://flux-framework--208.org.readthedocs.build/en/208/tutorials/commands/flux-mini-alloc.html
We can see with ``flux resource list`` that our Flux subinstance contains 4 nodes, a subset of | ||
the resources of the parent instance. In addition, we can see that we are now at an instance level of 1. | ||
|
||
We can submit a job to our subinstance via ``flux mini submit``. Lets name the job "Level1" using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Let us" I think.
We can submit a job to our subinstance via ``flux mini submit``. Lets name the job "Level1" using | |
We can submit a job to our subinstance via ``flux mini submit``. Let's name the job "Level1" using |
|
||
ƒgpD9HY9BsM/ƒge9VDjD: | ||
ƒb63wFg3 achu Level2 R 1 1 1.903m corona174 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a note here about how to exit from a subinstance (e.g., I think just exit
right?)
Oh no, I'm trapped in a subinstance! 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
|
||
However, the majority of the time you wouldn't want this. Most often, you want | ||
to launch a subinstance, perhaps launch a number of jobs within those | ||
subinstances, and just wait for them to complete. The most common way to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a workflow / real world use case of wanting to do this (I'm trying to understand myself still).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno if you're asking specifically about batch
vs alloc
for this question, but basically most of the time users don't want to be dropped into a shell
, they just want to launch their job and walk away. I can perhaps clear that up more here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do I want to launch a subinstance? I'm dense, I don't get it. Am I running a scientific workflow that is going to launch to GPU nodes? Am I writing a python script? A bash script? I don't get the use case - the alloc sort of demonstrates the concept of a subinstance, but in practice I don't know why I'd want to then use batch. What am I doing?
``flux mini batch`` takes a script instead of a command, so lets write two | ||
scripts that will do the exact same thing as we did above with ``flux mini | ||
alloc``. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay so conceptually here we are launching a flux job that literally is just submitting again? Is that best practice, and what's an actual example of wanting to do this? It seems kind of klunky.
flux job status ${id1} ${id2} | ||
|
||
In this first script we are doing exactly what we did in the first | ||
example when we were our level 1 instance. We first launch a sleep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example when we were our level 1 instance. We first launch a sleep | |
example when we were using our level 1 instance. We first launch a sleep |
id=`flux mini submit --job-name=Level2 -N1 sleep 60` | ||
flux job status ${id} | ||
|
||
In the second script (which we ran via ``flux mini batch`` in the first script), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level comment - this seems really hard, user interface-wise. If I were reading this tutorial I really wouldn't be excited to have to do this, if I had a use case. Why should I need to write nested scripts for flux, including flux commands, when I might be able to write one file with a clean logic to do it?
|
||
- ``flux mini submit/flux mini run`` (:ref:`flux-mini-submit`): "Submit a job in a Flux instance" | ||
- ``flux mini alloc/flux mini batch`` (:ref:`flux-mini-alloc`): "Create Flux subinstances" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need to have more of a "This is why I should care about this command." as opposed to defining it based on a term the user probably isn't familiar with. Why would I want to create subinstances? That's probably the question to ask to start.
So trying to answer alot of your questions in one answer. I view these command tutorials as going through the basics of how to do something. Certainly we can go over lightly why you'd want to do it, but I think the meaty "why you should do this" should come elsewhere. But because some of those meaty tutorials don't exist yet, maybe that's why this tutorial seems out of place? Or do we need to do something else in the docs to justify why this should exist? I do go into a bit of the meaty "why this is useful" with my "fast job submission" writeup in #195 where I go into scaling job submissions by creating several subinstances. Perhaps once that's merged, we could point to it? And then as time goes on point to other advanced tutorials? But to answer some of your questions at a high level (perhaps my bullets at the top of the tutorial need to dispersed around the tutorial better? or are not that good?), creating a subinstance is simply about distribution of work. If you submit 1 million jobs to the Flux system instance that's bad for you and for all of the other users on the system. You're just making the system instance slower dealing with all of your crap. If you create 1 subinstance, you submit 1 million jobs to your own instance, and spare every other user from what you just did. So that's better. If you create two subinstances, you only submit 500K jobs to each sub-instance, dividing up job submission throughput and scheduling processing. If you create 4 subinstances, etc. The simple examples are here clearly simple, and no user would ever want to do it for just 1 job. Edit: I've added a "advanced / medium complex" workflow to #197 |
I disagree. As a reader of documentation, if I don't know why I'd want to do something I'm not going to bother reading it. These command tutorials are close to "getting started" with commands and the first thing the reader needs to see is a strong "Why should I even care" clause. If I'm not convinced I should care, I won't. I don't know what a subinstance is and if the documentation doesn't tell me why I should care I'm not going to read further. The distinction you are making is more appropriate for a developer oriented tutorial, where the developer knows already they want to do something (and why) and are wanting to walk through it. If we are lucky enough to get a reader to click a tutorial that doesn't open with connecting the command with something they care about doing, at best they are then going to look for this in the tutorial itself, and then leave disappointed when they don't find it.
We can definitely point to other resources to reduce redundancy, but we still need to nail the point / use cases at the beginning of a tutorial like this. We can safely assume only a portion of users are going to want to follow a link to another link, to another link, etc.
Yes! That would be a great opening, to tell them that it's about a distribution of work. And as an example, let's say you have...
I don't know that.
If I'm a user I probably don't care about that, but if you tell them the first point (it's bad for them) they might care.
Maybe it would make sense to open with some kind of picture that goes alongside your steps? E.g., "Here is the entire flux cluster, if you submit directly this is what it looks like (show it overwhelming all resources / other users) but now, let's create our own subinstance - a portion of this graph that is completely owned by us! (and then show the flux subinstance nearly parceled off). I think as a reader I would get this, in addition to the description/commands that you have, because it would illustrate them if that makes sense. |
You bring up some good points. Just thinking for a bit outloud ...
I disagree with this point a bit. We are only at the beginning of the command tutorials and it will probably continue to get more complex/advanced/specific. It will reach a point when some tutorials will simply cover topics that probably 95+% of users will never care about. In fact, easily half of Flux users probably wouldn't care about So I think as time goes on, people aren't just going to wander into a number of tutorials. They will likely wander into them because of something else they read that pointed them to it. Alot of times it will probably be user support and we say "go look here". To push some of these tutorials along into something that can merged, perhaps just a short list of bullets of "why I should care about this command" at the top of some of these tutorials will be enough. And maybe just add a word of caution that "advanced topic, many users may not need to use these" or something would suffice? |
I'm on board with that - a basic "why should I care" bullet to go along with the main tutorial link (and maybe repeated with a bit more detail at the top) would meet the goal for the time being! |
Thinking about this a bit more given flux-framework/flux-core#4942, I'm just going to drop this PR. I was aiming for a middle ground between "command tutorial" and "advanced tutorial" and I think I just whiffed. A future "flux mini batch" tutorial should go into command directives and probably standard I/O. And it should be written within the context of "its obvious what a subinstance is" given a different tutorial that exists. |
What about if we had an advanced section? |
I hope you consider re-opening, it's really well written. I'm thinking what if we do: .. _command-tutorials:
Command Tutorials
===============
Welcome to the Command Tutorials! These tutorials should help you to map specific Flux commands
with your use case, and then see detailed usage.
--------
Basic
--------
- ``flux mini submit/flux mini run`` (:ref:`flux-mini-submit`): "Submit a job in a Flux instance"
- ``flux proxy`` (:ref:`ssh-across-clusters`): "Send commands to a Flux instance across clusters using ssh"
--------------
Advanced
--------------
- ``flux mini alloc/flux mini batch`` (:ref:`flux-mini-alloc`): "Create Flux subinstances"
This section is currently 🚧️ under construction 🚧️, so please come back later to see more command tutorials!
.. toctree::
:maxdepth: 2
:caption: Command Tutorials
flux-mini-submit
flux-mini-alloc
ssh-across-clusters And we'd need to figure out the sidebar too. |
The main reason I decided to drop is b/c I didn't go into Note that some users transitioning from So I eventually came to the conclusion the skeleton of this tutorial is good, but maybe not quite the messaging. That it needs to be re-worked in some way that the user will realize "when I want to create a subinstance, here's how I can go about doing it". As a complete aside, I'm wondering if the "Jobs" section needs to be spliced up eventually. At some point its this lingering side thing with information. I imagine some of it could go migrate into "FAQs" and some of it could become new tutorials. And we could just (in general) have some "advanced" tutorials section. That's here "why/when create a subinstance" could go and the "fast job submission" could go in there as well. |
No description provided.