You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/advanced.md
+14-6
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,8 @@
1
1
# Advanced Configuration
2
-
Initially `pysqa` was only designed to interact with the local queuing systems of an HPC cluster. This functionality has recently been extended to support remote HPC clusters in addition to local HPC clusters. These two developments, the support for remote HPC clusters and the support for multiple clusters in `pysqa` are discussed in the following. Both of these features are under active development so this part of the interface might change more frequently than the rest.
2
+
Initially `pysqa` was only designed to interact with the local queuing systems of an HPC cluster. This functionality has
3
+
recently been extended to support remote HPC clusters in addition to local HPC clusters. These two developments, the
4
+
support for remote HPC clusters and the support for multiple clusters in `pysqa` are discussed in the following. Both of
5
+
these features are under active development so this part of the interface might change more frequently than the rest.
3
6
4
7
## Remote HPC Configuration
5
8
Remote clusters can be defined in the `queue.yaml` file by setting the `queue_type` to `REMOTE`:
@@ -30,23 +33,27 @@ In addition to `queue_type`, `queue_primary` and `queues` parameters, this also
30
33
31
34
And optional keywords:
32
35
33
-
*`ssh_delete_file_on_remote` specify whether files on the remote HPC should be deleted after they are transferred back to the local system - defaults to `True`
36
+
*`ssh_delete_file_on_remote` specify whether files on the remote HPC should be deleted after they are transferred back
37
+
to the local system - defaults to `True`
34
38
*`ssh_port` the port used for the SSH connection on the remote HPC cluster - defaults to `22`
35
39
36
-
A definition of the `queues` in the local system is required to enable the parameter checks locally. Still it is sufficient to only store the individual submission script templates only on the remote HPC.
40
+
A definition of the `queues` in the local system is required to enable the parameter checks locally. Still it is
41
+
sufficient to only store the individual submission script templates only on the remote HPC.
37
42
38
43
## Access to Multiple HPCs
39
44
To support multiple remote HPC clusters additional functionality was added to `pysqa`.
40
45
41
-
Namely, a `clusters.yaml` file can be defined in the configuration directory, which defines multiple `queue.yaml` files for different clusters:
46
+
Namely, a `clusters.yaml` file can be defined in the configuration directory, which defines multiple `queue.yaml` files
47
+
for different clusters:
42
48
```
43
49
cluster_primary: local_slurm
44
50
cluster: {
45
51
local_slurm: local_slurm_queues.yaml,
46
52
remote_slurm: remote_queues.yaml
47
53
}
48
54
```
49
-
These `queue.yaml` files can again include all the functionality defined previously, including the configuration for remote connection using SSH.
55
+
These `queue.yaml` files can again include all the functionality defined previously, including the configuration for
56
+
remote connection using SSH.
50
57
51
58
Furthermore, the `QueueAdapter` class was extended with the following two functions:
52
59
```
@@ -56,4 +63,5 @@ To list the available clusters in the configuration and:
56
63
```
57
64
qa.switch_cluster(cluster_name)
58
65
```
59
-
To switch from one cluster to another, with the `cluster_name` providing the name of the cluster like `local_slurm` and `remote_slurm` in the configuration above.
66
+
To switch from one cluster to another, with the `cluster_name` providing the name of the cluster like `local_slurm` and
Copy file name to clipboardExpand all lines: docs/command.md
+23-9
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,17 @@
1
1
# Command Line Interface
2
-
The command line interface implements a subset of the functionality of the python interface. While it can be used locally to check the status of your calculation, the primary use case is accessing the `pysqa` installation on a remote HPC cluster from your local `pysqa` installation. Still here the local execution of the commands is discussed.
2
+
The command line interface implements a subset of the functionality of the python interface. While it can be used
3
+
locally to check the status of your calculation, the primary use case is accessing the `pysqa` installation on a remote
4
+
HPC cluster from your local `pysqa` installation. Still here the local execution of the commands is discussed.
3
5
4
-
The available options are the submission of new jobs to the queuing system using the submit option `--submit`, enabling reservation for a job already submitted using the `--reservation` option, listing jobs on the queuing using the status option `--status`, deleting a job from the queuing system using the delete option `--delete`, listing files in the working directory using the list option `--list` and the help option `--help` to print a summary of the available options.
6
+
The available options are the submission of new jobs to the queuing system using the submit option `--submit`, enabling
7
+
reservation for a job already submitted using the `--reservation` option, listing jobs on the queuing using the status
8
+
option `--status`, deleting a job from the queuing system using the delete option `--delete`, listing files in the
9
+
working directory using the list option `--list` and the help option `--help` to print a summary of the available
10
+
options.
5
11
6
12
## Submit job
7
-
Submission of jobs to the queuing system with the submit option `--submit` is similar to the submit job function `QueueAdapter().submit_job()`. Example call to submit the `hostname` command to the default queue:
13
+
Submission of jobs to the queuing system with the submit option `--submit` is similar to the submit job function
14
+
`QueueAdapter().submit_job()`. Example call to submit the `hostname` command to the default queue:
8
15
```
9
16
python -m pysqa --submit --command hostname
10
17
```
@@ -14,16 +21,21 @@ The options used and their short forms are:
14
21
15
22
Additional options for the submission of the job with their short forms are:
16
23
*`-f`, `--config_directory` the directory which contains the `pysqa` configuration, by default `~/.queues`.
17
-
*`-q`, `--queue` the queue the job is submitted to. If this option is not defined the `primary_queue` defined in the configuration is used.
24
+
*`-q`, `--queue` the queue the job is submitted to. If this option is not defined the `primary_queue` defined in the
25
+
configuration is used.
18
26
*`-j`, `--job_name` the name of the job submitted to the queuing system.
19
27
*`-w`, `--working_directory` the working directory the job submitted to the queuing system is executed in.
20
-
*`-n`, `--cores` the number of cores used for the calculation. If the cores are not defined the minimum number of cores defined for the selected queue are used.
28
+
*`-n`, `--cores` the number of cores used for the calculation. If the cores are not defined the minimum number of cores
29
+
defined for the selected queue are used.
21
30
*`-m`, `--memory` the memory used for the calculation.
22
-
*`-t`, `--run_time` the run time for the calculation. If the run time is not defined the maximum run time defined for the selected queue is used.
31
+
*`-t`, `--run_time` the run time for the calculation. If the run time is not defined the maximum run time defined for
32
+
the selected queue is used.
23
33
*`-b`, `--dependency` other jobs the calculation depends on.
24
34
25
35
## Enable reservation
26
-
Enable reservation for a job already submitted to the queuing system using the reservation option `--reservation` is similar to the enable reservation function `QueueAdapter().enable_reservation()`. Example call to enable the reservation for a job with the id `123`:
36
+
Enable reservation for a job already submitted to the queuing system using the reservation option `--reservation` is
37
+
similar to the enable reservation function `QueueAdapter().enable_reservation()`. Example call to enable the reservation
38
+
for a job with the id `123`:
27
39
```
28
40
python -m pysqa --reservation --id 123
29
41
```
@@ -35,12 +47,14 @@ Additional options for enabling the reservation with their short forms are:
35
47
*`-f`, `--config_directory` the directory which contains the `pysqa` configuration, by default `~/.queues`.
36
48
37
49
## List jobs
38
-
List jobs on the queuing system option `--status`, list calculations currently running and waiting on the queuing system for all users on the HPC cluster:
50
+
List jobs on the queuing system option `--status`, list calculations currently running and waiting on the queuing system
51
+
for all users on the HPC cluster:
39
52
```
40
53
python -m pysqa --status
41
54
```
42
55
The options used and their short forms are:
43
-
*`-s`, `--status` the status option lists the status of all calculation currently running and waiting on the queuing system.
56
+
*`-s`, `--status` the status option lists the status of all calculation currently running and waiting on the queuing
57
+
system.
44
58
45
59
Additional options for listing jobs on the queuing system with their short forms are:
46
60
*`-f`, `--config_directory` the directory which contains the `pysqa` configuration, by default `~/.queues`.
The configuration of a queuing system adapter, in particular in a remote configuration with a local installation of `pysqa` communicating to a remote installation on your HPC can be tricky.
2
+
The configuration of a queuing system adapter, in particular in a remote configuration with a local installation of
3
+
`pysqa` communicating to a remote installation on your HPC can be tricky.
3
4
4
5
## Local Queuing System
5
6
To simplify the process `pysqa` provides a series of steps for debugging:
6
7
7
-
* When `pysqa` submits a calculation to a queuing system it creates an `run_queue.sh` script. You can submit this script using your batch command e.g. `sbatch` for `SLURM` and take a look at the error message.
8
+
* When `pysqa` submits a calculation to a queuing system it creates an `run_queue.sh` script. You can submit this script
9
+
using your batch command e.g. `sbatch` for `SLURM` and take a look at the error message.
8
10
* The error message the queuing system returns when submitting the job is also stored in the `pysqa.err` file.
9
-
* Finally, if the `run_queue.sh` script does not match the variables you provided, then you can test your template using `jinja2`: `Template(open("~/.queues/queue.sh", "r").read()).render(**kwargs)` here `"~/.queues/queue.sh"` is the path to the queuing system submit script you want to use and `**kwargs` are the arguments you provide to the `submit_job()` function.
11
+
* Finally, if the `run_queue.sh` script does not match the variables you provided, then you can test your template using
12
+
`jinja2`: `Template(open("~/.queues/queue.sh", "r").read()).render(**kwargs)` here `"~/.queues/queue.sh"` is the path
13
+
to the queuing system submit script you want to use and `**kwargs` are the arguments you provide to the `submit_job()`
14
+
function.
10
15
11
16
## Remote HPC
12
-
The failure to submit to a remote HPC cluster can be related with to an issue with the local `pysqa` configuration or an issue with the remote `pysqa` configuration. To identify which part is causing the issue, it is recommended to first test the remote `pysqa` installation on the remote HPC cluster:
17
+
The failure to submit to a remote HPC cluster can be related with to an issue with the local `pysqa` configuration or an
18
+
issue with the remote `pysqa` configuration. To identify which part is causing the issue, it is recommended to first
19
+
test the remote `pysqa` installation on the remote HPC cluster:
13
20
14
21
* Login to the remote HPC cluster and import `pysqa` on a python shell.
15
-
* Validate the queue configuration by importing the queue adapter using `from pysqa import QueueAdapter` then initialize the object from the configuration dictionary `qa = QueueAdapter(directory="~/.queues")`. The current configuration can be printed using `qa.config`.
16
-
* Try to submit a calculation to print the hostname from the python shell on the remote HPC cluster using the `qa.submit_job(command="hostname")`.
17
-
* If this works successfully then the next step is to try the same on the command line using `python -m pysqa --submit --command hostname`.
18
-
19
-
This is the same command the local `pysqa` instance calls on the `pysqa` instance on the remote HPC cluster, so if the steps above were executed successfully, then the remote HPC configuration seems to be correct. The final step is validating the local configuration to see the SSH connection is successfully established and maintained.
22
+
* Validate the queue configuration by importing the queue adapter using `from pysqa import QueueAdapter` then initialize
23
+
the object from the configuration dictionary `qa = QueueAdapter(directory="~/.queues")`. The current configuration can
24
+
be printed using `qa.config`.
25
+
* Try to submit a calculation to print the hostname from the python shell on the remote HPC cluster using the
26
+
`qa.submit_job(command="hostname")`.
27
+
* If this works successfully then the next step is to try the same on the command line using
28
+
`python -m pysqa --submit --command hostname`.
20
29
30
+
This is the same command the local `pysqa` instance calls on the `pysqa` instance on the remote HPC cluster, so if the
31
+
steps above were executed successfully, then the remote HPC configuration seems to be correct. The final step is
32
+
validating the local configuration to see the SSH connection is successfully established and maintained.
Copy file name to clipboardExpand all lines: docs/installation.md
+10-4
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,8 @@
1
1
# Installation
2
-
The `pysqa` package can be installed either via `pip` or `conda`. While most HPC systems use Linux these days, the `pysqa` package can be installed on all major operation systems. In particular for connections to remote HPC clusters it is required to install `pysqa` on both the local system as well as the remote HPC cluster. In this case it is highly recommended to use the same version of `pysqa` on both systems.
2
+
The `pysqa` package can be installed either via `pip` or `conda`. While most HPC systems use Linux these days, the
3
+
`pysqa` package can be installed on all major operation systems. In particular for connections to remote HPC clusters it
4
+
is required to install `pysqa` on both the local system as well as the remote HPC cluster. In this case it is highly
5
+
recommended to use the same version of `pysqa` on both systems.
3
6
4
7
## pypi-based installation
5
8
`pysqa` can be installed from the python package index (pypi) using the following command:
@@ -9,15 +12,18 @@ pip install pysqa
9
12
On `pypi` the `pysqa` package exists in three different versions:
10
13
11
14
*`pip install pysaq` - base version - with minimal requirements only depends on `jinja2`, `pandas` and `pyyaml`.
12
-
*`pip install pysaq[sge]` - sun grid engine (SGE) version - in addition to the base dependencies this installs `defusedxml` which is required to parse the `xml` files from `qstat`.
13
-
*`pip install pysaq[remote]` - remote version - in addition to the base dependencies this installs `paramiko` and `tqdm`, to connect to remote HPC clusters using SSH and report the progress of the data transfer visually.
15
+
*`pip install pysaq[sge]` - sun grid engine (SGE) version - in addition to the base dependencies this installs
16
+
`defusedxml` which is required to parse the `xml` files from `qstat`.
17
+
*`pip install pysaq[remote]` - remote version - in addition to the base dependencies this installs `paramiko` and
18
+
`tqdm`, to connect to remote HPC clusters using SSH and report the progress of the data transfer visually.
14
19
15
20
## conda-based installation
16
21
The `conda` package combines all dependencies in one package:
17
22
```
18
23
conda install -c conda-forge pysqa
19
24
```
20
-
When resolving the dependencies with `conda` gets slow it is recommended to use `mamba` instead of `conda`. So you can also install `pysqa` using:
25
+
When resolving the dependencies with `conda` gets slow it is recommended to use `mamba` instead of `conda`. So you can
0 commit comments