Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tutorials: add flux mini alloc/batch tutorial #208

Closed
wants to merge 2 commits into from

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Feb 16, 2023

No description provided.

@chu11 chu11 mentioned this pull request Feb 16, 2023
16 tasks
Copy link
Member

@vsoch vsoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can see with ``flux resource list`` that our Flux subinstance contains 4 nodes, a subset of
the resources of the parent instance. In addition, we can see that we are now at an instance level of 1.

We can submit a job to our subinstance via ``flux mini submit``. Lets name the job "Level1" using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Let us" I think.

Suggested change
We can submit a job to our subinstance via ``flux mini submit``. Lets name the job "Level1" using
We can submit a job to our subinstance via ``flux mini submit``. Let's name the job "Level1" using


ƒgpD9HY9BsM/ƒge9VDjD:
ƒb63wFg3 achu Level2 R 1 1 1.903m corona174

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a note here about how to exit from a subinstance (e.g., I think just exit right?)

Oh no, I'm trapped in a subinstance! 😱

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea


However, the majority of the time you wouldn't want this. Most often, you want
to launch a subinstance, perhaps launch a number of jobs within those
subinstances, and just wait for them to complete. The most common way to do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a workflow / real world use case of wanting to do this (I'm trying to understand myself still).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno if you're asking specifically about batch vs alloc for this question, but basically most of the time users don't want to be dropped into a shell, they just want to launch their job and walk away. I can perhaps clear that up more here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do I want to launch a subinstance? I'm dense, I don't get it. Am I running a scientific workflow that is going to launch to GPU nodes? Am I writing a python script? A bash script? I don't get the use case - the alloc sort of demonstrates the concept of a subinstance, but in practice I don't know why I'd want to then use batch. What am I doing?

``flux mini batch`` takes a script instead of a command, so lets write two
scripts that will do the exact same thing as we did above with ``flux mini
alloc``.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay so conceptually here we are launching a flux job that literally is just submitting again? Is that best practice, and what's an actual example of wanting to do this? It seems kind of klunky.

flux job status ${id1} ${id2}

In this first script we are doing exactly what we did in the first
example when we were our level 1 instance. We first launch a sleep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
example when we were our level 1 instance. We first launch a sleep
example when we were using our level 1 instance. We first launch a sleep

id=`flux mini submit --job-name=Level2 -N1 sleep 60`
flux job status ${id}

In the second script (which we ran via ``flux mini batch`` in the first script),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level comment - this seems really hard, user interface-wise. If I were reading this tutorial I really wouldn't be excited to have to do this, if I had a use case. Why should I need to write nested scripts for flux, including flux commands, when I might be able to write one file with a clean logic to do it?


- ``flux mini submit/flux mini run`` (:ref:`flux-mini-submit`): "Submit a job in a Flux instance"
- ``flux mini alloc/flux mini batch`` (:ref:`flux-mini-alloc`): "Create Flux subinstances"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to have more of a "This is why I should care about this command." as opposed to defining it based on a term the user probably isn't familiar with. Why would I want to create subinstances? That's probably the question to ask to start.

@chu11
Copy link
Member Author

chu11 commented Feb 16, 2023

So trying to answer alot of your questions in one answer.

I view these command tutorials as going through the basics of how to do something. Certainly we can go over lightly why you'd want to do it, but I think the meaty "why you should do this" should come elsewhere. But because some of those meaty tutorials don't exist yet, maybe that's why this tutorial seems out of place? Or do we need to do something else in the docs to justify why this should exist?

I do go into a bit of the meaty "why this is useful" with my "fast job submission" writeup in #195 where I go into scaling job submissions by creating several subinstances. Perhaps once that's merged, we could point to it? And then as time goes on point to other advanced tutorials?

But to answer some of your questions at a high level (perhaps my bullets at the top of the tutorial need to dispersed around the tutorial better? or are not that good?), creating a subinstance is simply about distribution of work.

If you submit 1 million jobs to the Flux system instance that's bad for you and for all of the other users on the system. You're just making the system instance slower dealing with all of your crap. If you create 1 subinstance, you submit 1 million jobs to your own instance, and spare every other user from what you just did. So that's better. If you create two subinstances, you only submit 500K jobs to each sub-instance, dividing up job submission throughput and scheduling processing. If you create 4 subinstances, etc.

The simple examples are here clearly simple, and no user would ever want to do it for just 1 job.

Edit: I've added a "advanced / medium complex" workflow to #197

@vsoch
Copy link
Member

vsoch commented Feb 16, 2023

I view these command tutorials as going through the basics of how to do something. Certainly we can go over lightly why you'd want to do it, but I think the meaty "why you should do this" should come elsewhere. But because some of those meaty tutorials don't exist yet, maybe that's why this tutorial seems out of place? Or do we need to do something else in the docs to justify why this should exist?

I disagree. As a reader of documentation, if I don't know why I'd want to do something I'm not going to bother reading it. These command tutorials are close to "getting started" with commands and the first thing the reader needs to see is a strong "Why should I even care" clause. If I'm not convinced I should care, I won't. I don't know what a subinstance is and if the documentation doesn't tell me why I should care I'm not going to read further.

The distinction you are making is more appropriate for a developer oriented tutorial, where the developer knows already they want to do something (and why) and are wanting to walk through it. If we are lucky enough to get a reader to click a tutorial that doesn't open with connecting the command with something they care about doing, at best they are then going to look for this in the tutorial itself, and then leave disappointed when they don't find it.

I do go into a bit of the meaty "why this is useful" with my "fast job submission" writeup in #195 where I go into scaling job submissions by creating several subinstances. Perhaps once that's merged, we could point to it? And then as time goes on point to other advanced tutorials?

We can definitely point to other resources to reduce redundancy, but we still need to nail the point / use cases at the beginning of a tutorial like this. We can safely assume only a portion of users are going to want to follow a link to another link, to another link, etc.

But to answer some of your questions at a high level (perhaps my bullets at the top of the tutorial need to dispersed around the tutorial better? or are not that good?), creating a subinstance is simply about distribution of work.

Yes! That would be a great opening, to tell them that it's about a distribution of work. And as an example, let's say you have... <insert use case they can connect to / map to what they know>

If you submit 1 million jobs to the Flux system instance that's bad for you and for all of the other users on the system.

I don't know that.

You're just making the system instance slower dealing with all of your crap.

If I'm a user I probably don't care about that, but if you tell them the first point (it's bad for them) they might care.

If you create 1 subinstance, you submit 1 million jobs to your own instance, and spare every other user from what you just did. So that's better. If you create two subinstances, you only submit 500K jobs to each sub-instance, dividing up job submission throughput and scheduling processing. If you create 4 subinstances, etc.

Maybe it would make sense to open with some kind of picture that goes alongside your steps? E.g., "Here is the entire flux cluster, if you submit directly this is what it looks like (show it overwhelming all resources / other users) but now, let's create our own subinstance - a portion of this graph that is completely owned by us! (and then show the flux subinstance nearly parceled off). I think as a reader I would get this, in addition to the description/commands that you have, because it would illustrate them if that makes sense.

@chu11
Copy link
Member Author

chu11 commented Feb 16, 2023

You bring up some good points. Just thinking for a bit outloud ...

  • The primary reason I wrote this tutorial was b/c of your confusion on the flux proxy tutorial, that perhaps we needed a "start here" tutorial before flux proxy and tell readers to read that first.

  • But given your points above I think we're clearly crossing the line into a "advanced tutorial" if we begin to discuss distribution of resources and adding pictures of how subinstances are dividing up resources. A "what is a subinstance" and "why should I care" is probably just an advanced tutorial / topic that should be written instead. Edit: added to Advanced Tutorials #197

  • If such an advanced tutorial exists, this tutorial could point to the advanced tutorial to discuss why you should care about subinstances and we could say "look here first".

The distinction you are making is more appropriate for a developer oriented tutorial, where the developer knows already they want to do something (and why) and are wanting to walk through it. If we are lucky enough to get a reader to click a tutorial that doesn't open with connecting the command with something they care about doing, at best they are then going to look for this in the tutorial itself, and then leave disappointed when they don't find it.

I disagree with this point a bit. We are only at the beginning of the command tutorials and it will probably continue to get more complex/advanced/specific. It will reach a point when some tutorials will simply cover topics that probably 95+% of users will never care about. In fact, easily half of Flux users probably wouldn't care about flux mini alloc/batch at all, probably more.

So I think as time goes on, people aren't just going to wander into a number of tutorials. They will likely wander into them because of something else they read that pointed them to it. Alot of times it will probably be user support and we say "go look here".

To push some of these tutorials along into something that can merged, perhaps just a short list of bullets of "why I should care about this command" at the top of some of these tutorials will be enough. And maybe just add a word of caution that "advanced topic, many users may not need to use these" or something would suffice?

@vsoch
Copy link
Member

vsoch commented Feb 17, 2023

To push some of these tutorials along into something that can merged, perhaps just a short list of bullets of "why I should care about this command" at the top of some of these tutorials will be enough. And maybe just add a word of caution that "advanced topic, many users may not need to use these" or something would suffice?

I'm on board with that - a basic "why should I care" bullet to go along with the main tutorial link (and maybe repeated with a bit more detail at the top) would meet the goal for the time being!

@chu11
Copy link
Member Author

chu11 commented Feb 17, 2023

Thinking about this a bit more given flux-framework/flux-core#4942, I'm just going to drop this PR. I was aiming for a middle ground between "command tutorial" and "advanced tutorial" and I think I just whiffed. A future "flux mini batch" tutorial should go into command directives and probably standard I/O. And it should be written within the context of "its obvious what a subinstance is" given a different tutorial that exists.

@chu11 chu11 closed this Feb 17, 2023
@vsoch
Copy link
Member

vsoch commented Feb 17, 2023

What about if we had an advanced section?

@vsoch
Copy link
Member

vsoch commented Feb 17, 2023

I hope you consider re-opening, it's really well written. I'm thinking what if we do:

.. _command-tutorials:

Command Tutorials
===============

Welcome to the Command Tutorials! These tutorials should help you to map specific Flux commands
with your use case, and then see detailed usage.

--------
Basic
--------

 - ``flux mini submit/flux mini run`` (:ref:`flux-mini-submit`): "Submit a job in a Flux instance"
 - ``flux proxy`` (:ref:`ssh-across-clusters`): "Send commands to a Flux instance across clusters using ssh"

--------------
Advanced
--------------

 - ``flux mini alloc/flux mini batch`` (:ref:`flux-mini-alloc`): "Create Flux subinstances"

This section is currently 🚧️ under construction 🚧️, so please come back later to see more command tutorials!


.. toctree::
   :maxdepth: 2
   :caption: Command Tutorials

   flux-mini-submit
   flux-mini-alloc
   ssh-across-clusters

And we'd need to figure out the sidebar too.

@chu11
Copy link
Member Author

chu11 commented Feb 17, 2023

The main reason I decided to drop is b/c I didn't go into stdio or (soon to be) command directives. So I think in time, there should be a flux mini batch tutorial to basically go over how to use it and some of the most common options/usage. Such a tutorial would just assume users know why they would want to use it.

Note that some users transitioning from sbatch to flux mini batch will just do it "because." They may not know it creates a subinstance, and quite frankly they may not care nor need to know. So I think the messaging/context of such a flux mini batch-basics tutorial needs to be a tad different.

So I eventually came to the conclusion the skeleton of this tutorial is good, but maybe not quite the messaging. That it needs to be re-worked in some way that the user will realize "when I want to create a subinstance, here's how I can go about doing it".

As a complete aside, I'm wondering if the "Jobs" section needs to be spliced up eventually. At some point its this lingering side thing with information. I imagine some of it could go migrate into "FAQs" and some of it could become new tutorials. And we could just (in general) have some "advanced" tutorials section. That's here "why/when create a subinstance" could go and the "fast job submission" could go in there as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants