Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first shot at re-organizing sections #185

Merged
merged 1 commit into from
Jan 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
name: flux-docs build and check
on: pull_request
jobs:
check-pr:
Expand All @@ -24,4 +25,4 @@ jobs:
- name: Check Spelling
uses: crate-ci/typos@7ad296c72fa8265059cc03d1eda562fbdfcd6df2 # v1.9.0
with:
files: "*.rst"
files: "*.rst */*.rst */*/*.rst"
6 changes: 3 additions & 3 deletions contributing.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
.. _contributing:

================================
Contributing to Flux Development
================================
============
Contributing
============

The Flux Framework team welcomes all contributors for bug fixes, code improvements, new features, simplifications, documentation, and more. Please do not hesitate to `contact us <https://github.com/orgs/flux-framework/people>`_ with any questions or concerns.

Expand Down
129 changes: 86 additions & 43 deletions faqs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,40 @@
FAQs
####

Some frequently asked questions about flux and their answers.
Some frequently asked questions about flux and their answers! 🤔️

.. _background_faq:


**********
Background
**********

What is Flux Framework?
=======================

Flux is a flexible framework for resource management, built for your site.
The core framework here consists of a suite of projects, tools, and libraries which may be used to build site-custom resource managers for
High Performance Computing centers. The set of core projects are described in this documentation, and our
larger family of projects can be seen on `our portal page <https://flux-framework.org>`_.

What does it mean for a cluster to deploy Flux?
===============================================

Most of the time when someone talks about Flux, they will be describing the combined install
of several projects here that manifest in a full cluster to submit workflows.
This cluster is comparable to other job managers like SLURM or SGE in that it can be installed
as the main workload manager for a site.

Where does Flux work?
=====================

You likely are associating Flux with high performance computing in that it is comparable
to other job managers. However, Flux has a unique ability to nest, meaning you (as a user) could
launch a Flux Instance under a slurm allocation, for example. Along with scheduler nesting,
you can easily demo Flux in a container, or even used in Kubernetes with the
`Flux Operator <https://flux-framework.org/flux-operator>`_. We have a vision for Flux to
allow for converged computing, or making it easy to move between traditional HPC and cloud.


*****************
Expand Down Expand Up @@ -50,7 +83,7 @@ If your `Konsole <https://konsole.kde.org/>`_ terminal displays ``ƒ`` as ``Æ``
check that Settings → Edit → Profile → Advanced → Encoding: Default
Character Encoding is set to ``UTF-8``, not ``ISO8859-1``.

Does flux run on a mac?
Does flux run on a Mac?
=======================

Not yet. We have an open `issue <https://github.com/flux-framework/flux-core/issues/2892>`_
Expand All @@ -66,6 +99,10 @@ You can read up on reporting bugs here: :ref:`contributing` or report one
directly for flux `core <https://github.com/flux-framework/flux-core/issues>`_
or `sched <https://github.com/flux-framework/flux-sched/issues>`_.

*******************
Resources Questions
*******************

.. _not_managing_all_resources:

Why is Flux ignoring my Nvidia GPUs?
Expand Down Expand Up @@ -107,47 +144,6 @@ core and GPU bindings, so if resources are missing, affinity and binding
from the parent resource manager should be checked. In Slurm, try
``--mpibind=off``, in LSF jsrun, try ``--bind=none``.

.. _launch_large_num_jobs:

How do I efficiently launch a large number of jobs?
===================================================

If you have more than 10K fast-cycling jobs to run, here are some tips that
may help improve efficiency and throughput:

- Create a batch job or allocation to contain the jobs in a Flux sub-instance.
This improves performance over submitting them directly to the Flux system
instance and reduces the impact of your jobs on system resources and other
users. See also: :ref:`batch`.
- If scripting ``flux mini submit`` commands, avoid the pattern of one command
per job as each command invocation has a startup cost. Instead try to
combine similar job submissions with ``flux mini submit --cc=IDSET``
or `flux-mini builksubmit <https://flux-framework.readthedocs.io/projects/flux-core/en/latest/man1/flux-mini.html#bulksubmit>`_.
- By default ``flux mini submit --cc=IDSET`` and ``flux mini bulksubmit``
will exit once all jobs have been submitted. To wait for all jobs to
complete before proceeding, use the ``--wait`` or ``--watch`` options to
these tools.
- If multiple commands must be used to submit jobs before waiting for them,
consider using ``--flags=waitable`` and ``flux job wait --all`` to wait for
jobs to complete and capture any errors.
- If the jobs to be submitted cannot be combined with the ``flux mini`` tools,
develop a workflow management script using the
`Flux python interface <https://flux-framework.readthedocs.io/projects/flux-core/en/latest/python/index.html>`_. The
`flux-mini <https://github.com/flux-framework/flux-core/blob/master/src/cmd/flux-mini.py>`_
command itself is a python program that can be a useful reference.
- If jobs produce a significant amount of standard I/O, use the
:core:man1:`flux-mini` ``--output`` option to redirect it to files. By
default, standard I/O is captured in the Flux key value store, which holds
other job metadata and may become a bottleneck if jobs generate a large
amount of output.
- When handling many fast-cycling jobs, the rank 0 Flux broker may require
significant memory and cpu. Consider excluding that node from scheduling
with ``flux resource drain 0``.

Since Flux can be launched as a parallel job within foreign resource managers
like SLURM and LSF, your efforts to develop an efficient batch or workflow
management script that runs within a Flux instance can be portable to those
systems.

.. _overcommit_resources:

Expand Down Expand Up @@ -245,6 +241,53 @@ Note the following:

See also: :core:man7:`flux-broker-attributes`.

**************
Jobs Questions
**************

.. _launch_large_num_jobs:

How do I efficiently launch a large number of jobs?
===================================================

If you have more than 10K fast-cycling jobs to run, here are some tips that
may help improve efficiency and throughput:

- Create a batch job or allocation to contain the jobs in a Flux sub-instance.
This improves performance over submitting them directly to the Flux system
instance and reduces the impact of your jobs on system resources and other
users. See also: :ref:`batch`.
- If scripting ``flux mini submit`` commands, avoid the pattern of one command
per job as each command invocation has a startup cost. Instead try to
combine similar job submissions with ``flux mini submit --cc=IDSET``
or `flux-mini builksubmit <https://flux-framework.readthedocs.io/projects/flux-core/en/latest/man1/flux-mini.html#bulksubmit>`_.
- By default ``flux mini submit --cc=IDSET`` and ``flux mini bulksubmit``
will exit once all jobs have been submitted. To wait for all jobs to
complete before proceeding, use the ``--wait`` or ``--watch`` options to
these tools.
- If multiple commands must be used to submit jobs before waiting for them,
consider using ``--flags=waitable`` and ``flux job wait --all`` to wait for
jobs to complete and capture any errors.
- If the jobs to be submitted cannot be combined with the ``flux mini`` tools,
develop a workflow management script using the
`Flux python interface <https://flux-framework.readthedocs.io/projects/flux-core/en/latest/python/index.html>`_. The
`flux-mini <https://github.com/flux-framework/flux-core/blob/master/src/cmd/flux-mini.py>`_
command itself is a python program that can be a useful reference.
- If jobs produce a significant amount of standard I/O, use the
:core:man1:`flux-mini` ``--output`` option to redirect it to files. By
default, standard I/O is captured in the Flux key value store, which holds
other job metadata and may become a bottleneck if jobs generate a large
amount of output.
- When handling many fast-cycling jobs, the rank 0 Flux broker may require
significant memory and cpu. Consider excluding that node from scheduling
with ``flux resource drain 0``.

Since Flux can be launched as a parallel job within foreign resource managers
like SLURM and LSF, your efforts to develop an efficient batch or workflow
management script that runs within a Flux instance can be portable to those
systems.


.. _mimic_slurm_jobstep:

How do I run job steps?
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
15 changes: 15 additions & 0 deletions guides/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
.. _flux-guides:

Guides
======

Read more about how to administer Flux, or use our Python API.
Do you have a question? `let us know <https://github.com/flux-framework/flux-docs/issues>`_

.. toctree::
:maxdepth: 1
:caption: Guides

admin-guide
accounting-guide
Flux Python API <https://flux-framework.readthedocs.io/projects/flux-core/en/latest/python/index.html>
20 changes: 3 additions & 17 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,25 +25,17 @@ The framework consists of a suite of projects, tools, and libraries which may be

quickstart
faqs
debugging
batch
hierarchies
LLNL Introduction to Flux <https://hpc-tutorials.llnl.gov/flux/>
coral
coral2
adminguide
flux-accounting-guide
jobs/index
tutorials/index
guides/index
contributing
Flux Python API <https://flux-framework.readthedocs.io/projects/flux-core/en/latest/python/index.html>
stats

.. toctree::
:maxdepth: 1
:caption: Sub-Projects

projects


Contributor Relevant RFCs
-------------------------

Expand All @@ -65,9 +57,3 @@ Manual Pages
- :ref:`core:man-pages`
- :ref:`sched:man-pages`
- :ref:`security:man-pages`


Workflow Examples
-----------------

- :doc:`workflow-examples:index`
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
15 changes: 15 additions & 0 deletions jobs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
.. _flux-jobs:

Jobs
====

Here you can learn about Flux jobs - whether that be debugging, batch jobs, or job hierarchies.
Do you have a question? `let us know <https://github.com/flux-framework/flux-docs/issues>`_

.. toctree::
:maxdepth: 2
:caption: Jobs

debugging
batch
hierarchies
63 changes: 37 additions & 26 deletions quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,31 +19,13 @@ A quick introduction to Flux and flux-core.
Building the Code
-----------------

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Spack: Recommended for curious users
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Flux maintains an up-to-date package in the `spack
<https://github.com/spack/spack>`_ develop branch. If you’re already using
spack, just run the following to install flux and all necessary dependencies:

.. code-block:: console

$ spack install flux-sched

The above command will build and install the latest tagged version of
flux-sched and flux-core. To install the latest master branches, use the
``@master`` version specifier: ``spack install flux-sched@master``. If
you want Flux to manage and schedule Nvidia GPUs, include the ``+cuda``
variant: ``spack install flux-sched+cuda``. This builds a CUDA-aware
version of hwloc.
.. _docker_installation:

^^^^^^
Docker
^^^^^^

For instructions on installing spack, see `Spack's installation documentation <https://spack.readthedocs.io/en/latest/getting_started.html#installation>`_.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Docker: Recommended for quick, single-node deployments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Recommended for quick, single-node deployments

Flux has a continuously updated Docker image available for download on
`Docker Hub <https://hub.docker.com/u/fluxrm>`_. If you already have docker
Expand Down Expand Up @@ -99,11 +81,40 @@ our testsuite within a docker container, you can use our helper script:
.. note::
Both the flux-core and flux-sched repositories have the ``docker-run-checks.sh`` helper script

.. _spack_installation:

^^^^^
Spack
^^^^^

Recommended for curious users

Flux maintains an up-to-date package in the `spack
<https://github.com/spack/spack>`_ develop branch. If you’re already using
spack, just run the following to install flux and all necessary dependencies:

.. code-block:: console

$ spack install flux-sched

The above command will build and install the latest tagged version of
flux-sched and flux-core. To install the latest master branches, use the
``@master`` version specifier: ``spack install flux-sched@master``. If
you want Flux to manage and schedule Nvidia GPUs, include the ``+cuda``
variant: ``spack install flux-sched+cuda``. This builds a CUDA-aware
version of hwloc.


For instructions on installing spack, see `Spack's installation documentation <https://spack.readthedocs.io/en/latest/getting_started.html#installation>`_.

.. _manual_installation:

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Manual Installation: Recommended for developers and contributors
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^
Manual Installation
^^^^^^^^^^^^^^^^^^^


Recommended for developers and contributors

Ensure the latest list of requirements are installed. The
current list of build requirements are detailed `here <https://github.com/flux-framework/flux-core#readme>`_.
Expand Down
15 changes: 15 additions & 0 deletions tutorials/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
.. _tutorials:

Tutorials
=========

Flux is developed at `Lawrence Livermore National Lab <http://llnl.gov>`_ and consequently we have tutorials specific to the clusters
provided there are for collaborating systems, along with other families of tutorials for your use! Explore the sections below to
find a tutorial of interest.

.. toctree::
:maxdepth: 2
:caption: Tutorials

lab/index
integrations/index
12 changes: 12 additions & 0 deletions tutorials/integrations/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.. _integration-tutorials:

Integration Tutorials
=====================

These tutorials include those that are useful for plugins or other integrations with Flux.

.. toctree::
:maxdepth: 1
:caption: Integration Tutorials

stats
File renamed without changes.
File renamed without changes.
File renamed without changes.
16 changes: 16 additions & 0 deletions tutorials/lab/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
.. _lab-tutorials:

Lab Tutorials
=============

Flux is developed at `Lawrence Livermore National Lab <http://llnl.gov>`_ and consequently we have tutorials specific to the clusters
provided there are for collaborating systems.

.. toctree::
:maxdepth: 2
:caption: Lab Tutorials

LLNL Introduction to Flux <https://hpc-tutorials.llnl.gov/flux/>
coral
coral2