Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tutorials: add a tutorial on submitting jobs to Flux #194

Merged
merged 1 commit into from
Feb 10, 2023

Conversation

cmoussa1
Copy link
Member

@cmoussa1 cmoussa1 commented Feb 8, 2023

This is a small [WIP] PR built on top of #192 that adds a tutorial on how to submit jobs to Flux. It leverages the steps outlined in the job-submit-cli workflow example. I ran the commands outlined in an updated Docker container to make sure they still worked (I could always use another set of eyes to double-check me, though 😉).

Future expansion of this specific tutorial could include a chapter on how to submit jobs to Flux using it's job submission API. But for the purposes of getting some short tutorials out there, I've just included the command-line portion.

@cmoussa1 cmoussa1 added the enhancement New feature or request label Feb 8, 2023
@cmoussa1 cmoussa1 changed the title [WIP] add a tutorial on submitting jobs to Flux [WIP] tutorials: add a tutorial on submitting jobs to Flux Feb 8, 2023
@cmoussa1 cmoussa1 mentioned this pull request Feb 8, 2023
16 tasks
@chu11
Copy link
Member

chu11 commented Feb 8, 2023

immediate thought is if we should cover lots of variants and common options:

  • --task-per-node and --tasks-per-core?
  • --wait option?
  • flux mini run - run until completion, stdout/stderr live
  • batch / alloc
  • launch to get a shell (alloc)
  • flux job attach??
  • flux job cancel?
  • flux job kill?
  • job dependencies (although this might be under its own tutorial)?

so this might become more more "job submission and management basics"??

@garlick
Copy link
Member

garlick commented Feb 8, 2023

Maybe this could be one of those documents like you proposed @chu11 where it starts simple and adds complexity? If the topic is "job submission" Maybe it should try to stay focused on getting jobs into the system?

@chu11
Copy link
Member

chu11 commented Feb 8, 2023

Maybe this could be one of those documents like you proposed @chu11 where it starts simple and adds complexity? If the topic is "job submission" Maybe it should try to stay focused on getting jobs into the system?

Good point, perhaps this specific doc should be simpler then, no need to explain getting R and stuff at the bottom.

@grondo
Copy link
Contributor

grondo commented Feb 8, 2023

This one is under "command tutorials". So maybe flux mini tutorial or flux mini submit tutorial is the correct title. Though it is admittedly very difficult to keep to that specific topic in a tutorial (kind of feel like that is what command man pages should be?)

Copy link
Member

@vsoch vsoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments! And I guess this depends on merging the other ssh PR?

with your use case, and then see detailed usage.

- ``flux proxy`` (:ref:`ssh-across-clusters`): "Send commands to a flux instance across clusters using ssh"
- ``job-submit`` (:ref:`job-submit`): "Submit a job in a Flux instance"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be flux mini submit ? I think here we want to start to help to make associations between the actual command to be run and the use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why is this showing up as a new file? Do we just need to merge the other PR? Ping @grondo (but when you have time I know there are issues with Flux atm on a cluster!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand what you mean, I think the other PR needs to be merged first. This PR just contains the other PR's commits which is why the file is new.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that is what I was trying to say, poorly.

@@ -0,0 +1,55 @@
.. _job-submit:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. _job-submit:
.. _flux-mini-submit:


$ flux mini submit --nodes=2 --ntasks=4 --cores-per-task=2 ./my_compute_script.lua 120
ƒM5k8m7m
$ flux mini submit --nodes=1 --ntasks=1 --cores-per-task=2 ./my_other_script.lua 120
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we show them a new command, it would be good below it to say "in the above, we are asking for " so the reader starts to make sense of the options/args too.

$ flux mini submit --nodes=1 --ntasks=1 --cores-per-task=2 ./my_other_script.lua 120
ƒSUEFPDH

A jobID is returned for every job submitted. You can view the status of your
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A jobID is returned for every job submitted. You can view the status of your
A jobID (e.g., ``ƒSUEFPDH``) is returned for every job submitted. You can view the status of your

And do we have a term for this in the new terms guide? If yes - let's link!


.. code-block:: sh

$ flux job info ƒM5k8m7m R
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to get this in a table (non json) - this was a question I had the other day and I couldn't figure out from the command line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think actually we should not advertise flux job info in a high-level tutorial. It is more of a "plumbing" command. Instead we should focus on flux jobs, the main interface users will use to get information about their jobs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Sorry for including it in this tutorial. I'll go ahead and just remove this section then and only include flux jobs.

$ flux job info ƒM5k8m7m R
{"version":1,"execution":{"R_lite":[{"rank":"0-1","children":{"core":"0-3"}}]}}

There are a number of keys you can pass to get various information about your job:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what the listing below is from, or used for? It would be good to add the context, and then description for what each of the below actually means!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the reason why it is probably best to leave flux job info out of user tutorials. An advanced tutorial could perhaps explain this command

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flux job info is a kind of plumbing command, not very user friendly, and probably shouldn't be here IMHO

with your use case, and then see detailed usage.

- ``flux proxy`` (:ref:`ssh-across-clusters`): "Send commands to a flux instance across clusters using ssh"
- ``job-submit`` (:ref:`job-submit`): "Submit a job in a Flux instance"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this a "command tutorial" should the bullet reference the flux mini submit command?

Suggested change
- ``job-submit`` (:ref:`job-submit`): "Submit a job in a Flux instance"
- ``flux mini submit`` (:ref:`job-submit`): "Submit a job in a Flux instance"


.. code-block:: sh

$ flux mini submit --nodes=2 --ntasks=4 --cores-per-task=2 ./my_compute_script.lua 120
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was thinking if there are any other important options to mention, the only one I could think of is --queue.

IMO, that's more important than --cores-per-task. You could probably just mention that there are many advanced ways to request resources besides --nodes and --ntasks, and to see the manpage for details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are open to it, I like having (toward the end) a big block of just examples with description, that sort of show all the options that a flux command can provide. E.g., I think I linked this before, but this example comes to mind! https://rse-ops.github.io/knowledge/docs/schedulers/slurm.html#command-quick-reference. Sometimes people's eyes will glaze over the text and they just want to find the right thing to copy paste :) (guilty!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's a good idea

@cmoussa1 cmoussa1 force-pushed the add.job.submit.tutorial branch from 79a3f3a to 637e9f3 Compare February 9, 2023 17:25
@cmoussa1
Copy link
Member Author

cmoussa1 commented Feb 9, 2023

Thanks for all the great feedback and suggestions everybody! I've just force pushed some changes to the tutorial based on the suggestions above. To summarize, I've made the following changes:

  1. I've changed the name of the document and its title to flux-mini-submit
  2. I've included a short description of the two job submission examples at the top of the tutorial and provided links to both the definition of a Flux job ID (to our new glossary) and the flux mini man page.
  3. I've removed references to flux job info and just included flux jobs.
  4. I've attached a couple more examples to submitting jobs using flux mini at the bottom of the tutorial that use some different options. Specifically, I've added an example submitting a job to a specific queue and an example submitting a job with --dry-run to get the jobspec for the job (and linking to the definition of jobspec in our glossary).

Thanks again for the feedback. I think I might as well take this out of [WIP], but note that we should probably land #192 before this one, since this PR is built on top of #192.

@cmoussa1 cmoussa1 changed the title [WIP] tutorials: add a tutorial on submitting jobs to Flux tutorials: add a tutorial on submitting jobs to Flux Feb 9, 2023
@cmoussa1 cmoussa1 marked this pull request as ready for review February 9, 2023 17:33
Comment on lines 46 to 59
-------------------------------------
More Examples of Submitting Flux Jobs
-------------------------------------

.. code-block:: sh

$ flux mini submit --nodes=2 --queue=foo --name=my_special_job ./my_job.lua

This submits a job to the `foo` queue across two nodes, and sets a custom name
to the job.

.. code-block:: sh

$ flux mini submit --dry-run ./my_cool_job.lua
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this. I will suggest we have one example with the --output option, as I imagine many would want that. It'd perhaps be wise to illustrate use of {{id}} in the output option too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a good idea, thanks. I'll add an example that includes this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just force-pushed a commit that adds an example including --output and {{id}} when submitting a job.


.. code-block:: sh

$ flux mini submit --nodes=2 --ntasks=4 --cores-per-task=2 ./my_compute_script.lua 120
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, should we do .py everywhere instead of .lua? since python is more popular.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sure! Good call. Just force-pushed a fix to use .py instead of .lua

@cmoussa1 cmoussa1 force-pushed the add.job.submit.tutorial branch from 1bb28a5 to 69ab2ca Compare February 9, 2023 22:19
@vsoch
Copy link
Member

vsoch commented Feb 10, 2023

@cmoussa1 the ssh tutorial is merged! You should be able to rebase locally and then we can finish up review here and get the tutorial in.

Copy link
Member

@chu11 chu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Add a small tutorial on submitting jobs to Flux via "flux mini submit". To
start, add a couple simple examples of submitting jobs to Flux using different
options.
@cmoussa1 cmoussa1 force-pushed the add.job.submit.tutorial branch from 69ab2ca to 2ebcf82 Compare February 10, 2023 16:06
@cmoussa1
Copy link
Member Author

Thanks @vsoch and @chu11! 😃 I've just rebased off current master and will set MWP on this

@cmoussa1 cmoussa1 added the merge-when-passing mark PR for auto-merging by mergify.io bot label Feb 10, 2023
@mergify mergify bot merged commit 316293c into flux-framework:master Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request merge-when-passing mark PR for auto-merging by mergify.io bot
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants