Skip to content

Latest commit

 

History

History
486 lines (344 loc) · 14.6 KB

quickstart.rst

File metadata and controls

486 lines (344 loc) · 14.6 KB

Quick Start

A quick introduction to Flux and flux-core. If you are looking to see how Flux compares to your favorite resource manager, see :ref:`comparison-table`. For reference, the :ref:`Glossary` contains definitions for a number of terms, Flux-specific concepts, and acronyms mentioned in this guide and throughout our documentation site.

Note

If your site runs Flux natively, you may wish to check with your help desk to see if you have a site-specific introduction to Flux, since Flux is configurable. For example, LLNL has published an Introduction to Flux. This document generally assumes you are exploring on your own.

Building the Code

Docker

Recommended for quick, single-node deployments

Flux has a continuously updated Docker image available for download on Docker Hub. If you already have docker installed, just run the following to download the latest Flux docker image and start a container from it:

$ docker run -ti fluxrm/flux-sched:latest
$ flux getattr size
1
$ flux run printenv FLUX_JOB_ID
2597498912768

Note

Multi-node docker deployments of Flux is still an ongoing area of research. This installation method is recommended for developers and users curious to try single-node instances of Flux.

The default CMD for flux docker images is flux start /bin/bash. To emulate a multi-node deployment within a single container, replace the default CMD by supplying args to docker run:

$ docker run -ti fluxrm/flux-sched flux start --test-size 4
$ flux getattr size
4

Note that a more complete Docker tutorial is available at :ref:`flux-sched-container`. If you are developing Flux and want to compile the modified code and run our testsuite within a docker container, you can use our helper script:

$ git clone https://github.com/flux-framework/flux-core.git
Initialized empty Git repository in /home/fluxuser/flux-core/.git/
$ cd flux-core
$ ./src/test/docker/docker-run-checks.sh
<snip>
============================================================================
Testsuite summary for flux-core 0.16.0-371-gfe938825c
============================================================================
# TOTAL: 2389
# PASS:  2367
# SKIP:  19
# XFAIL: 3
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================

Note

Both the flux-core and flux-sched repositories have the docker-run-checks.sh helper script

Spack

Recommended for curious users

Flux maintains an up-to-date package in the spack develop branch. If you’re already using spack, just run the following to install flux and all necessary dependencies:

$ spack install flux-sched

The above command will build and install the latest tagged version of flux-sched and flux-core. To install the latest master branches, use the @master version specifier: spack install flux-sched@master. If you want Flux to manage and schedule Nvidia GPUs, include the +cuda variant: spack install flux-sched+cuda. This builds a CUDA-aware version of hwloc.

For instructions on installing spack, see Spack's installation documentation.

Manual Installation

Recommended for developers and contributors

Ensure the latest list of requirements are installed. The current list of build requirements are detailed here.

Clone current flux-core master:

$ git clone https://github.com/flux-framework/flux-core.git
Initialized empty Git repository in /home/fluxuser/flux-core/.git/
$ cd flux-core

Build flux-core. In order to build python bindings, ensure you have python-3.6 and python-cffi available in your current environment:

$ ./autogen.sh && ./configure --prefix=$HOME/local
Running aclocal ...
Running libtoolize ...
Running autoheader ...
...
$ make -j 8
...

Ensure all is right with the world by running the built-in make check target:

$ make check
Making check in src
...

Clone current flux-sched master:

$ git clone https://github.com/flux-framework/flux-sched.git
Initialized empty Git repository in /home/fluxuser/flux-sched/.git/
$ cd flux-sched

Build flux-sched. By default, flux-sched will attempt to configure against flux-core found in the specified --prefix using the same PYTHON_VERSION:

$ ./autogen.sh && ./configure --prefix=$HOME/local
Running aclocal ...
Running libtoolize ...
Running autoheader ...
...
$ make
...

Ensure all is right with the world by running the built-in make check target:

$ make check
Making check in src
...

Starting a Flux Instance

In order to use Flux, you first must initiate a Flux instance.

A Flux instance is composed of a group of :ref:`flux-broker<flux-broker>` processes which are launched via any parallel launch utility that supports :ref:`PMI<pmi>`. For example, srun, mpiexec.hydra, etc., or locally for testing via the flux start command with the -s, --test-size=N option.

To start a Flux instance with 4 brokers on the local node, use flux start:

$ flux start --test-size=4
$

A Flux :ref:`instance<flux-instance>` can be also be started under Slurm using PMI. To start by using srun(1), simply run the flux start command without the --test-size option under a Slurm job. You will likely want to start a single broker process per node:

$ srun -N4 -n4 --pty flux start
srun: Job is in held state, pending scheduler release
srun: job 1136410 queued and waiting for resources
srun: job 1136410 has been allocated resources
$

An interactive Flux instance can also be started under Flux with :core:man1:`flux-alloc`:

$ flux alloc -n144 -N4
$

Note

flux alloc requires the -n, --nslots=N parameter, which by default will allocate 1 core per slot. The command above will request to allocate 144 core across 4 nodes (for example, for a system with 36 cores)

After broker wire up is completed, the Flux instance starts an “initial program” on rank 0 broker. By default, the initial program is an interactive shell, but an alternate program can be supplied on the flux start command line. Once the initial program terminates, the Flux instance is considered complete and brokers exit.

To get help on any flux subcommand or API function, the flux help command may be used. For example, to view the man page for the flux-top(1) command, use

$ flux help top

flux help can also be run by itself to see a list of commonly used Flux commands.

Interacting with a Flux Session

There are several low-level commands of interest to interact with a Flux instance. For example, to view the total resources available in the current instance, flux resource status may be used:

$ flux resource status
    STATUS NNODES RANKS           NODELIST
     avail      4 0-3             quartz[2306,2306,2306,2306]

To view the scheduling state of resources use flux resource list:

$ flux resource list
     STATE NNODES   NCORES    NGPUS NODELIST
      free      4      144        0 quartz[2306,2306,2306,2306]
 allocated      0        0        0
      down      0        0        0

Note

Since we are running a test instance with 4 brokers on the same host via the --test-size=4 option, those hosts are repeated in the NODELIST above. This allows Flux to simulate a multi-node cluster on a single node.

The size, broker rank, URIs, logging levels, as well as other instance parameters are termed “broker attributes” and can be viewed and manipulated with the lsattr, getattr, and setattr commands, for example. For a description of all attributes see :core:man7:`flux-broker-attributes`

$ flux getattr rank
0
$ flux getattr size
4

The current log level is also an attribute and can be modified at runtime:

$ flux getattr log-level
6
$ flux setattr log-level 4  # Make flux quieter
$ flux getattr log-level
4

Attributes are per-broker so to set or get a value on a different broker rank or across the entire instance flux getattr or flux setattr should be run via :core:man1:`flux-exec`.

To see a list of all attributes and their values, use flux lsattr -v.

Log messages from each broker are kept in a local ring buffer. Recent log messages for the local rank may be dumped via the flux dmesg command:

$ flux dmesg | tail -4
2016-08-12T17:53:24.073219Z broker.info[0]: insmod cron
2016-08-12T17:53:24.073847Z cron.info[0]: synchronizing cron tasks to event hb
2016-08-12T17:53:24.075824Z broker.info[0]: Run level 1 Exited (rc=0)
2016-08-12T17:53:24.075831Z broker.info[0]: Run level 2 starting

Services within a Flux instance may be implemented by modules loaded in the flux-broker process on one or more ranks of the instance. To query and manage broker modules, Flux provides a flux module command:

$ flux module list
Module                   Size Digest  Idle  S Service
job-exec              1274936 D83AE37    4  S
job-manager           1331496 1F432DD    4  S
kvs-watch             1299400 AA90CE6    4  S
kvs                   1558712 7D8432C    0  S
sched-simple          1241744 AA85006    4  S sched
job-info              1348608 CA590E9    4  S
barrier               1124360 DDA1A3A    4  S
cron                  1202792 1B2DFD1    0  S
connector-local       1110736 5AE480D    0  R
job-ingest            1214040 19306CA    4  S
userdb                1122432 0AA8778    4  S
content-sqlite        1126920 EB0D5E9    4  S content-backing
aggregator            1141184 5E1E0B6    4  S

The most basic functionality of these service modules can be tested with the :core:man1:`flux-ping` utility, which targets a builtin *.ping handler registered by default with each module.

$ flux ping --count=2 kvs
kvs.ping pad=0 seq=0 time=0.402 ms (2da0be18!301c7e16!3e4f235f!9cea08f1)
kvs.ping pad=0 seq=1 time=0.307 ms (2da0be18!301c7e16!3e4f235f!9cea08f1)

Flux KVS

The :ref:`kvs<kvs>` (Key-Value Store) is a core component of a Flux instance. The flux kvs command provides a utility to list and manipulate values of the KVS. For example, resource information for the current instance is loaded into the kvs by the resource module at instance startup. The resource information is available under the kvs key resource.R. For example, the count of total Cores available on rank 0 can be obtained from the kvs via:

$ flux kvs get resource.R
{"version": 1, "execution": {"R_lite": [{"rank": "0-3", "children": {"core": "0-35"}}], "starttime": 0.0, "expiration": 0.0, "nodelist": ["quartz[2306,2306,2306,2306]"]}}

See flux help kvs for more information.

Launching Work in a Flux Session

Flux has two methods to launch “remote” tasks and parallel work within a instance. The flux exec utility is a low-level remote execution framework which depends on as few other services as possible and is used primarily for testing. By default, flux exec runs a single copy of the provided COMMAND on each rank in a instance:

$ flux exec flux getattr rank
0
3
2
1

Though individual ranks may be targeted:

$ flux exec -r 3 flux getattr rank
3

The second method for launching work is using one of the Flux job submission tools:

  • Run 4 copies of hostname.
$ flux run -n4 --label-io hostname
3: quartz15
2: quartz15
1: quartz15
0: quartz15
  • Run an MPI job (for MPI that supports PMI).
$ flux run -n128 ./hello
completed MPI_Init in 0.944s.  There are 128 tasks
completed first barrier
completed MPI_Finalize
  • Run a job and immediately detach. (Since jobs are KVS based, jobs can run completely detached from any “front end” command.)
$ flux submit -n128 ./hello
ƒA6oPHNjh

Here, the allocated ID for the job is immediately echoed to stdout.

The flux job command also includes many subcommands which are useful, including

  • View output of a job.
$ flux job attach ƒA6oPHNjh
completed MPI_Init in 0.932s.  There are 128 tasks
completed first barrier
completed MPI_Finalize
  • Cancel a pending or running job, or send a signal to a running job
$ flux job cancel ƒMjstRfzF

or

$ flux job kill ƒMjstRfzF
$ flux jobs
     JOBID USER     NAME       ST NTASKS NNODES  RUNTIME NODELIST
 ƒPugMu2Ty fluxuser sleep       R      1      1   1.564s quartz2306
 ƒPugLR3Bd fluxuser sleep       R      1      1   1.565s quartz2306
  • To include jobs which have completed for the current user add the -a option
$ flux jobs -a
     JOBID USER     NAME       ST NTASKS NNODES  RUNTIME NODELIST
 ƒPugMu2Ty fluxuser sleep       R      1      1   1.564s quartz2306
 ƒPugLR3Bd fluxuser sleep       R      1      1   1.565s quartz2306
  ƒP55Ntdd fluxuser sleep      CD      1      1   4.052s quartz2306
  ƒ8QzNhZh fluxuser hostname   CD      1      1   0.053s quartz2306

By default flux jobs -a will list up to 1000 jobs. To limit output use the -c, --count=N option.