Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: flux radiuss tutorial 2024 #31

Merged
merged 9 commits into from
Aug 10, 2024
Merged

add: flux radiuss tutorial 2024 #31

merged 9 commits into from
Aug 10, 2024

Conversation

vsoch
Copy link
Member

@vsoch vsoch commented Apr 12, 2024

This preserves the state of the original changes for flux RADIUSS 2024, mostly to calm my own anxiety.

This adds the new directory for the Flux Riken tutorial, with the following additions:

  1. flux-tree was removed from flux-sched and is added here
  2. tutorial files were kept in rse-ops, are now moved here
  3. New tutorial content: flux tree and hierarchy section/examples
  4. New tutorial content: flux archive (previously flux filemap)
  5. images that show a dummy example of job throughout
  6. update of names in login page / directory to be more general
  7. automated builds updated for riken

@vsoch
Copy link
Member Author

vsoch commented Jul 5, 2024

I am starting preliminary changes for RADIUSS 2024. Builds explicitly cancelled to not waste resources. I'll include some notes, and screenshots of work so far. For the container, I've updated the base and changed a few of the orderings of commands to be more logical (COPY directives should come later, for example).

For the notebook, the launcher is fixed to load the tutorials via buttons. This means it opens and all content is front and center, with working buttons for Tutorials and resources (at the bottom).

image

The launcher took a hot minute to get working again (it broke between this and last year). Also note that the nesting is removed - I think it made the tutorial more complex to require the user to navigate a few directories deep to find the content. Now everything is there for the tutorial user when the notebook opens, no need to navigate into many sub- directories. I am still hoping to do the following:

  • Get the icon working (no luck so far)
  • Move the "flux core" content to be separate from supplementary (e.g., Dyad and other) that will be below it
  • Remove catalog sections not used (so it should be flux core, supplementary, and resources)

As stated above, I am moving "core" flux tutorial content separate from what we can consider external or plugins, like dyad. The reason is because these are experimental and change from year to year, and having it alongside core content makes a promise about consistency that I am not sure we can keep. This means we have three chapters, and then supplementary material (which can be extended, removed, added, and doesn't make a promise about consistently). For example, DLIO doesn't even have that name anymore.

image

I̶ h̶a̶v̶e̶ n̶o̶t̶ d̶o̶n̶e̶ t̶h̶i̶s̶ y̶e̶t̶I (update: this is done) I (have) separated the files (images and notebooks) for dyad cleanly as well - they were all mixed together and this would make it challenging from an organizational standpoint to add / remove / re-organize entire supplementary sections (chapters). Now when you click the Dyad tutorial button, it's cleanly it's own thing (I like this better):

image

For the terminal, we've also been requiring the user to navigate to File -> Open -> Terminal to do that. That's really hard and annoying. I figured out how to make a button instead:

image

This opens a terminal immediately. Finally, I have started marginal content changes. This will be a multi-step process. I also expect the DLIO/DYAD content to be entirely different, and we will need a round to work on that. The build takes a long time, and I remember getting it working was challenging. I'll engage this team when I'm done re-organizing. There is also some bug with flux-tree and awk that needs debugging. Cheers.

vsoch added 3 commits July 4, 2024 23:28
This adds the new directory for the Flux Riken tutorial, with
the following additions:
1. flux-tree was removed from flux-sched and is added here
2. tutorial files were kept in rse-ops, are now moved here
3. New tutorial content: flux tree and hierarchy section/examples
4. New tutorial content: flux archive (previously flux filemap)
5. images that show a dummy example of job throughout
6. update of names in login page / directory to be more general
7. automated builds updated for riken

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
I am starting preliminary changes for RADIUSS 2024. So far:
- the launcher is fixed to load the tutorials via buttons
- I am moving "core" flux tutorial content separate from
  what we can consider external or plugins, like dyad. The
  reason is because these are experimental and change from
  year to year, and having it alongside core content makes
  a promise about consistency that I am not sure we can
  keep. I have not done this yet, but I am going to separate
  the files (images and notebooks) cleanly as well.
- top level: everything is there for the tutorial user when
  the notebook opens, no need to navigate into many sub-
  directories.
- terminal button: instead of "click this long list of
  annoying paths to open a terminal" I figured out how to
  make a button in the notebook directly.
- marginal content changes: I am starting to tweak / update
  content, this will be a multi-step process.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
…tary section

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@vsoch vsoch force-pushed the 2024-radiuss-aws branch from 0953704 to 5a94458 Compare July 5, 2024 05:29
vsoch added 3 commits July 16, 2024 18:46
The notebook structure should primarily be organized by
command, since this is what the new user will interact with.
This change set better does that, and flattens / organizes
things a bit better overall.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Improvement on sections and table for flux commands, and
addition of flux accounting to container (likely will not
easily work).

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@jacobtkeio
Copy link

Hello! I am writing to provide some feedback for this version of the tutorial. Please note that I have just learned how to use both Flux and Slurm in the last three weeks, so I am much less experienced with the subject than your target audience.

Overall Impression

This tutorial is easy to set up, informative, well-organized, and useful. I love the feel of running through the cells and trying things in the terminal. I learned a lot and I got a good sense of why Flux is useful. All of the visuals are helpful and cohesive, as is the style. This is something I would reference for flux help and related links. Finally, it was a little intimidating in its size/scope, especially within Chapter 1.

Logistics

There were some things that were broken or out of place:

Chapter 1

  • Broken links:
  • Typos:
    • You might want to request an allocation for a set of resources (an allocation) and then attach to the[m] interactively.

    • sub_job2.sh: Is going to be submit[ted] by sub_job1.sh.

  • Cells:
    • Extra blank cell under #flux bulksubmit > carbon copy
    • For the four jobs under "Here are some "carbon copy" jobs to try in the [JupyterLab terminal]:"
      3. flux submit --output=job-{{id}}.out echo "This is job {cc}" ~ is missing its --cc option
      4. flux bulksubmit --dry-run --cc={0} echo {1} ::: a b c ::: 0-1 0-3 0-7 ~ has {0} and {1} backwards or the "a b c" and "0-1 0-3 0-7" parts backwards
    • The #flux top section is missing a [JupyterLab terminal] button (I love the JupyterLab terminal buttons)
    • In #flux proxy the tutorial suggests, "Then from the [JupyterLab terminal] run the commands below!" but there are no commands below
    • In the python API section in the "flux.job.get_job(handle, jobid) to get job info" example, there's a cell that just says "### flux jobs"
    • Also in this section, there are two cells with "!flux jobs -a | grep compute" but only the first is necessary

Chapter 2:

Chapter 3:

DYAD/DLIO:

  • Broken links:
    • Design of DYAD (not sure why :/)
    • here (under Integrating DYAD into PyTorch)
    • Module 4
    • For some reason the dl-training-io.png image is not displaying under #Distributed DL Training
  • Typos:
    • One key difference between distributed DL training and many conventional HPC applications (e.g., MPI-based simulations) is the asynchronous loading of data by workers during training. In many conventional HPC applications, data loading and computation are performed one after the one. On the other hand, as shown in Figure X, the loading of data in distributed DL training is asynchronous. In other words, while the GPU is training the DL model for epoch N, the worker [is] reading and creating the batch for epoch N+1.

    • Next, we start the DYAD service. This involves two steps. First, we need to create a namespace withing the Flux key-value store.

    • This concludes Module 3. [maybe Supplementary Chapter 1?]

    • All figure references are "figure X."
  • Cells:
    • The first cell does not run:

ModuleNotFoundError Traceback (most recent call last)
/tmp/ipykernel_236/1884463090.py in <cell line: 11>()
9 sys.path.insert(0, os.path.abspath("../dlio_extensions/"))
10
---> 11 from dyad_torch_data_loader import DYADTorchDataset
ModuleNotFoundError: No module named 'dyad_torch_data_loader'

  • The flux broker also crashes on "!flux kvs namespace remove {kvs_namespace}" if you run all cells in order despite the errors. I can provide the crash log but I figure this won't happen once the other cells are working. Indeed, the notebook doesn't crash unless you run the broken cells first.

Suggestions

I thought some things could change:

Chapter 1

  • There are two headers with the text, "I'm ready! How do I do this tutorial? 😁️." The first, smaller section covers why and how to run cells. I think that this information is well-covered by the second "I'm ready!" section, the instructions to play the Flux video, and the "What does the terminal prompt mean?" sections. I would suggest removing the first "I'm ready!" section to make the introduction flow smoother.
  • In a similar vein, in the "Getting started with Flux section," there are two pieces of text that explain getting help for a specific command, the first paragraph and the Tip box. I would suggest removing one of these.
  • I thought starting out with "Creating Flux Instances" was pretty difficult because there isn't really a Slurm equivalent to starting a flux instance and the explanation has a lot of tough (at least for me) conceptual information. I think that moving "Creating Flux Instances" to after "The Flux Hierarchy 🍇️" would ease a lot of the difficulty and it would fit right in after talking about how flux can start its own sub-instances. This would make the order of the intro "Getting Started with Flux" -> "Flux Resources" -> "Flux Commands," which I think makes for a smooth introduction.
  • When I was running the flux submit commands in the #flux submit and #flux bulksubmit sections, I often wished to see the output of the submitted job but only got back the job ID (for example, !flux submit hostname gave me ƒ3VqNqo3Qs, but I was curious about the hostname!). There are a lot of ways to go about getting the output back to the user, but I would suggest adding an --output switch on relevant commands with brief instructions to check that file for the output. What I like about this approach is that it emphasizes the asynchronicity of submission (as opposed to run or --watch). It's still a judgement call to provide the output or not, but I just wanted to comment on this because I noticed it while going through.
  • Regarding the example command under #flux bulksubmit, "flux submit --cc="1-10" echo "Hello I am job {cc}"", I think it would be good to have scripts 1-10 ready to go in the repository.
  • Similarly, in #flux proxy, the tutorial suggests editing sleep_batch.sh to sleep for 60 or 120 seconds instead of 30, and it would be good if sleep_batch.sh had a 120 second timer by default. Then you could also remove that warning sentence and keep the reader focused on the flux proxy command.
  • The #flux archive section mentions the flux KVS before it gets introduced: "At a high level, flux archive allows us to save named pieces of data (e.g., files) to the Flux KVS for later retrieval." I suggest removing this mention to prevent confusion and because I think this sentence works after removing it ("At a high level, flux archive allows us to save named pieces of data (e.g., files) to Flux for later retrieval", or you could also remove "to Flux"). It's also covered perfectly in Chapter 2, which ties back the flux archive command.
  • Finally, I would suggest changing the conclusion's fourth summary topic to something like "Python Submission API" instead of "Deeper Dive into Flux Internals," which I don't feel is representative of the last section.

Assorted Opinions

  • The description line under each command is really helpful and a great way to remember what they do. I did notice that half are -ing form and half are command form, and I kind of wish they were all the same form. E.g. "Watching jobs" "Querying the status of jobs" vs. "Show how long this flux instance has been running" "Show a table of real-time Flux processes." This is obviously very inconsequential but it could be nice for them to all match with one or the other.
  • I really like this part of #flux watch: "So what makes flux watch different from flux job attach? Aside from the fact that flux watch is read-only, flux watch can watch many (or even all (flux watch --all) jobs at once!" You read my mind and answered my question real-time as I read this.
  • I feel like the meme under "#flux submit from within a batch" is out of place and the section could do without it. The preceding paragraph is already a great conclusion.
  • I learned a whole lot in the #Process, Monitoring, and Job Utilities ⚙️ section and I love that it's color-coded.
  • Chapter 2 is also incredibly solid. I learned a lot, the visuals are particularly effective, it's a great length and the conclusion wraps it up perfectly.
  • I feel like a bit more white-space between different sections (between the last sentence and the next header) could help the tutorial flow better visually.
  • Finally, I think it could be nice to split Chapter 1 into three different chapters. Namely, the introduction and Getting Started with Flux (all the blue), Process, Monitoring, and Job Utilities ⚙️ (all the green), and the Python Submission API. I've put this in my opinion section because I think it would be a pain to actually split these up and I didn't want to make a flat-out suggestion. However, I did really feel like Chapter 1 was very long and even looking at the size of the scroll-bar is kind of rough when you're just starting out at the introduction. I also think having the extra conclusion/summary sections at the end of these chapters would help me recall more information afterwards. And lastly it's kind of hard to scroll through Chapter 1 and find a given command at this size. Chapter 1 is already so well divided into these three sections, but I definitely wouldn't want to have to manually rename all the references to specific chapters, so I'm leaving this one completely up to you.

All of that being said, I think this tutorial will already be very successful in its current state given its wealth of information, fun delivery, and engaging structure. It's clear to me that a ton of effort went into writing and updating this, and I am just hoping that some of these comments are helpful in the development process.

Jacob Tkeio (via @ trws and @ milroy)

@vsoch
Copy link
Member Author

vsoch commented Jul 26, 2024

@jacobtkeio this is the most extensive feedback I've ever received! I will go through this carefully and address the points - it might take me some of the weekend, but I'll post an update here. Thank you so much for doing this! Your perspective as a new user is immensely valuable.

@vsoch
Copy link
Member Author

vsoch commented Jul 27, 2024

Okay I've turned your feedback into a list of TODO, and am going to comment (and update) here I fix or change things. I will ping once more in a new comment when I'm done, and I'll have a new branch (to PR on top of this) to show changes (and a new container to quickly test).

Completed

  • Admin guide link fixed to here
  • Typo (fixed) You might want to request an allocation for a set of resources (an allocation) and then attach to the[m] interactively.
  • Typo (fixed) sub_job2.sh: Is going to be submit[ted] by sub_job1.sh.
  • Per your comment about it being intimidating, I'm going to remove resources from the top (and keep in the final chapter) so it's less content (and hopefully intimidating).
  • (fixed) Extra blank cell under #flux bulksubmit > carbon copy
  • (fixed) flux submit --output=job-{{id}}.out echo "This is job {cc}" ~ is missing its --cc option
  • (fixed) flux bulksubmit --dry-run --cc={0} echo {1} ::: a b c ::: 0-1 0-3 0-7 ~ has {0} and {1} backwards or the "a b c" and "0-1 0-3 0-7" parts backwards
  • (added, and thank you)! The #flux top section is missing a [JupyterLab terminal] button (I love the JupyterLab terminal buttons)
  • (fixed) In #flux proxy the tutorial suggests, "Then from the [JupyterLab terminal] run the commands below!" but there are no commands below
  • (fixed) In the python API section in the "flux.job.get_job(handle, jobid) to get job info" example, there's a cell that just says "### flux jobs"
  • (fixed) Also in this section, there are two cells with "!flux jobs -a | grep compute" but only the first is necessary
  • Chapter 2 broken links all fixed
  • Chapter 3 broken links all fixed
  • All DYAD/DLIO work is under care of @ilumsden and @hariharan-devarajan, they can respond to your feedback / comments and update the tutorial as needed (a PR to this branch would be preferred)! If that can't be done in time, we can disable DYAD for this round.
  • (done) There are two headers with the text, "I'm ready! How do I do this tutorial? 😁️." The first, smaller section covers why and how to run cells. I think that this information is well-covered by the second "I'm ready!" section, the instructions to play the Flux video, and the "What does the terminal prompt mean?" sections. I would suggest removing the first "I'm ready!" section to make the introduction flow smoother
  • (done) In a similar vein, in the "Getting started with Flux section," there are two pieces of text that explain getting help for a specific command, the first paragraph and the Tip box. I would suggest removing one of these.
  • (done - and this is a really good observation! I've moved flux start to be entirely in a separate section, as a command. You are right it's confusing at the top. The tutorial user doesn't need to know that we start their tutorial that way (but we can tell them later)) I thought starting out with "Creating Flux Instances" was pretty difficult because there isn't really a Slurm equivalent to starting a flux instance and the explanation has a lot of tough (at least for me) conceptual information. I think that moving "Creating Flux Instances" to after "The Flux Hierarchy 🍇️" would ease a lot of the difficulty and it would fit right in after talking about how flux can start its own sub-instances. This would make the order of the intro "Getting Started with Flux" -> "Flux Resources" -> "Flux Commands," which I think makes for a smooth introduction.
  • (Added comments to show getting output) When I was running the flux submit commands in the #flux submit and #flux bulksubmit sections, I often wished to see the output of the submitted job but only got back the job ID (for example, !flux submit hostname gave me ƒ3VqNqo3Qs, but I was curious about the hostname!). There are a lot of ways to go about getting the output back to the user, but I would suggest adding an --output switch on relevant commands with brief instructions to check that file for the output. What I like about this approach is that it emphasizes the asynchronicity of submission (as opposed to run or --watch). It's still a judgement call to provide the output or not, but I just wanted to comment on this because I noticed it while going through.

Added! And with a cute example too 🐻‍❄️

image

  • Regarding the example command under #flux bulksubmit, "flux submit --cc="1-10" echo "Hello I am job {cc}"", I think it would be good to have scripts 1-10 ready to go in the repository.

oh man, so this just got really good 😆

image

  • (note - for this one, the copy paste is so quick that it actually works OK with 30 seconds, no editing needed. I'm going to remove that note). Similarly, in #flux proxy, the tutorial suggests editing sleep_batch.sh to sleep for 60 or 120 seconds instead of 30, and it would be good if sleep_batch.sh had a 120 second timer by default. Then you could also remove that warning sentence and keep the reader focused on the flux proxy command.

  • (done) The #flux archive section mentions the flux KVS before it gets introduced: "At a high level, flux archive allows us to save named pieces of data (e.g., files) to the Flux KVS for later retrieval." I suggest removing this mention to prevent confusion and because I think this sentence works after removing it ("At a high level, flux archive allows us to save named pieces of data (e.g., files) to Flux for later retrieval", or you could also remove "to Flux"). It's also covered perfectly in Chapter 2, which ties back the flux archive command.

  • (done, and good catch! That was an older section) Finally, I would suggest changing the conclusion's fourth summary topic to something like "Python Submission API" instead of "Deeper Dive into Flux Internals," which I don't feel is representative of the last section.

  • (agree, and done) The description line under each command is really helpful and a great way to remember what they do. I did notice that half are -ing form and half are command form, and I kind of wish they were all the same form. E.g. "Watching jobs" "Querying the status of jobs" vs. "Show how long this flux instance has been running" "Show a table of real-time Flux processes." This is obviously very inconsequential but it could be nice for them to all match with one or the other.

I really like this part of #flux watch: "So what makes flux watch different from flux job attach? Aside from the fact that flux watch is read-only, flux watch can watch many (or even all (flux watch --all) jobs at once!" You read my mind and answered my question real-time as I read this

Awesome! 😎

  • (haha, ok, removed. I'll leave in the repository as an easter egg) I feel like the meme under "#flux submit from within a batch" is out of place and the section could do without it. The preceding paragraph is already a great conclusion.

I learned a whole lot in the #Process, Monitoring, and Job Utilities ⚙️ section and I love that it's color-coded.

I love the colors too - added them newly this year 🟢 🟣

Chapter 2 is also incredibly solid. I learned a lot, the visuals are particularly effective, it's a great length and the conclusion wraps it up perfectly.

Nice!

  • (done, and good feedback) I feel like a bit more white-space between different sections (between the last sentence and the next header) could help the tutorial flow better visually.

Finally, I think it could be nice to split Chapter 1 into three different chapters. Namely, the introduction and Getting Started with Flux (all the blue), Process, Monitoring, and Job Utilities ⚙️ (all the green), and the Python Submission API. I've put this in my opinion section because I think it would be a pain to actually split these up and I didn't want to make a flat-out suggestion. However, I did really feel like Chapter 1 was very long and even looking at the size of the scroll-bar is kind of rough when you're just starting out at the introduction. I also think having the extra conclusion/summary sections at the end of these chapters would help me recall more information afterwards. And lastly it's kind of hard to scroll through Chapter 1 and find a given command at this size. Chapter 1 is already so well divided into these three sections, but I definitely wouldn't want to have to manually rename all the references to specific chapters, so I'm leaving this one completely up to you.

This is good feedback - I actually intentionally put into one notebook because of people's attention - I assumed they would have attention for one notebook and then drop off. That said, I see your point about length, and that we have sort of created internal chapters. I think for this year (since there are a lot of changes, and this is the first time trying this change set) we should try to do it with one notebook, and then perhaps ask for feedback about it. We can definitely split the notebook up if people seem to agree, which they definitely might. Ping @milroy - let's make sure we discuss some kind of feedback form (with a specific set of questions for this).

All of that being said, I think this tutorial will already be very successful in its current state given its wealth of information, fun delivery, and engaging structure. It's clear to me that a ton of effort went into writing and updating this, and I am just hoping that some of these comments are helpful in the development process.

That's fantastic! We had a lot of work to do with the automation bit last year, and this year I wanted to focus a lot more energy on the tutorial. I am going to build you a new container and I'll have a new testing container to post here (and in slack soon).

Still TODO

Thank you again for this feedback @jacobtkeio I have been at this a long time and have never recevied such a solid set of very good suggestions and fixed.

@hariharan-devarajan
Copy link
Collaborator

@ilumsden would u be looking into this? If not let me know.

vsoch and others added 2 commits July 26, 2024 19:37
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@ilumsden
Copy link
Collaborator

@hariharan-devarajan I may be able to work on this a little, but not a ton. I learned today that I've got to prepare a substantial chunk of the results for a paper due in 2 weeks. Plus, I have to prepare my 2 page extended abstract for SC. And I've got to start working on end of internship stuff since I (somehow) only have 2 weeks left.

Any help you can provide on this would be appreciated.

@vsoch
Copy link
Member Author

vsoch commented Jul 27, 2024

@jacobtkeio the updates are done, the testing should be the same as before:

docker pull vanessa/flux-tutorial:radiuss-aws-2024
docker network create jupyterhub
docker run --rm -it --entrypoint...-v /var/run/docker.sock:/var/run/docker.sock --net jupyterhub --name jupyterhub -p 8888:8888 vanessa/flux-tutorial:radiuss-aws-2024

And you can see full edits here:

a6a2e14

Have a good weekend!

@hariharan-devarajan
Copy link
Collaborator

@hariharan-devarajan I may be able to work on this a little, but not a ton. I learned today that I've got to prepare a substantial chunk of the results for a paper due in 2 weeks. Plus, I have to prepare my 2 page extended abstract for SC. And I've got to start working on end of internship stuff since I (somehow) only have 2 weeks left.

Any help you can provide on this would be appreciated.

The main road block I have is the testing environment. I remember the notebook had some differences between AWS and my local. I don't have access to AWS yet and so I may not be able to address and test the changes.

@ilumsden
Copy link
Collaborator

@hariharan-devarajan you can run locally with Docker. There should be instructions in the README in the folder for this version of the tutorial

@hariharan-devarajan
Copy link
Collaborator

@vsoch what is the deadline for this. Maybe I can prioritize this over other things depending on the urgency. I would love to get all comments addressed so that we can keep this integrated.

@vsoch
Copy link
Member Author

vsoch commented Jul 27, 2024

The main road block I have is the testing environment. I remember the notebook had some differences between AWS and my local. I don't have access to AWS yet and so I may not be able to address and test the changes.

You won't need AWS! Just docker locally. It works really well now, no changing directory once you open the notebook! And @hariharan-devarajan I've posted the instructions above, and in slack, and to control +S from the container and save locally just do:

docker run --rm -it --entrypoint /start.sh -v $PWD/tutorial:/home/jovyan -v /var/run/docker.sock:/var/run/docker.sock --net jupyterhub --name jupyterhub -p 8888:8888 flux-tutorial

That particular instruction in the README here needs to be fixed (and I'll update it next time around, the volume path is off so it won't persist when you save from the container to the host).

@vsoch
Copy link
Member Author

vsoch commented Jul 27, 2024

@vsoch what is the deadline for this. Maybe I can prioritize this over other things depending on the urgency. I would love to get all comments addressed so that we can keep this integrated.

The deadline for changes is Friday, August 23rd, and the tutorial is the following link. Here is the slack post with all the details.

@ilumsden
Copy link
Collaborator

ilumsden commented Jul 27, 2024

Oh, is that the deadline? I can definitely help out then. I just won't be able to do a ton before I'm back in Knoxville on August 11. But, once I'm back, I could easily put this at the top of my to-do list

@vsoch
Copy link
Member Author

vsoch commented Jul 27, 2024

Oh, is that the deadline? I can definitely help out then. I just won't be able to do a ton before I'm back in Knoxville on August 11. But, once I'm back, I could easily put this at the top of my to-do list

Yes, there is absolutely time! I don't plan a lot of deadlines (I mostly just do things when it feels right) but when it's needed, I like to be early so I can myself feel relaxed, expect the unexpected, and (in the case of wanting feedback from many people) give them ample weeks to do that.

Have a good trip to Knoxville!

@jacobtkeio
Copy link

@vsoch, thank you! :) I have never received such a thoughtful response to my feedback, and I really appreciate it.

I've ran through the updated tutorial and I love the all the new examples and explanations. Pizzaquack is my favorite <3 and the #flux start section is excellent. The explanation is stellar and I think it works perfectly where you put it.
I am not sure that I downloaded the new notebook correctly because all my old cell outputs were still there, so ignore this if it's probably an issue on my end. Otherwise, my three remaining comments are purely logistical:

  1. (ch.1) I think that the end of the #flux submit section might have gotten moved to the end of the #flux bulksubmit section. The #flux submit section ends by talking about the -N, -n, and --cores-per-task options, and then the "!flux submit -N1 -n2 sleep inf" example, which comes a little out of left field. However, this text that comes at the end of the #flux bulksubmit section,

Note: in this tutorial, we cannot assume that the host you are running on has multiple cores, thus the examples below only > vary the number of nodes per job. Varying the cores-per-task is also possible on Flux when the underlying hardware supports it (e.g., a multi-core node). Let's run the middle example - it's a fun one, I promise!

!flux submit --nodes=2 --ntasks=2 --cores-per-task=1 --job-name simulation sleep inf
!flux submit --nodes=1 --ntasks=1 --cores-per-task=1 --job-name analysis sleep inf

ƒ8ke6RSvB
ƒ8kqvqiPq

seems to fit quite well as a replacement for the "!flux submit -N1 -n2 sleep inf" command. And, pizzaquack seems like a perfect ending for #flux bulksubmit. What do you think about this? I will add lastly that I was confused about what "the middle example" was and that I noticed that, in it's current state, the tutorial says that two example in a row are fun ones.

  1. (ch.1) On a similar note, I think that this command from the end of #the flux hierarchy, "!flux exec -r all -x 0 flux archive extract --name myarchive --directory $(pwd) shared-file.txt" belongs in #flux archive as the penultimate command following, "Now that the directory has been created on all our nodes, we can extract the archive onto those nodes by combining flux exec and flux archive extract." And the cell demonstrating the -X option of pstree at the end of #the flux hierarchy is missing! (Assuming I'm remembering right and there used to be one)
  2. (ch.2) The text for the very first link back to Chapter 1, "Chapter 2" says "Chapter 2" instead of "Chapter 1" (but it does link to chapter 1).

Again, thank you so much for the swift and thorough response. I hope you had a good weekend!
Jacob Tkeio

conduit still does not compile, and a note
was added about that.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@vsoch
Copy link
Member Author

vsoch commented Aug 10, 2024

@hariharan-devarajan and @ilumsden - the main tutorial and workflows are done so I'm going to merge into main, the main reason so we can build the images and start preparing the last two bullets. This should also make it easier (and more clean) for you to PR to the main branch, so let's pick up there (when @ilumsden is back from his trip). Have a good weekend both!

@vsoch vsoch merged commit 05b0c50 into master Aug 10, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants