Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
1cb2886
-h flag as a shorthand for --help
Ladme Nov 10, 2025
bd3fec5
Bumped version
Ladme Nov 10, 2025
a4be85a
qq autocomplete
Ladme Nov 10, 2025
8138d9c
Bumped installation scripts versions
Ladme Nov 10, 2025
81af8f9
Fixed typos
Ladme Nov 10, 2025
65b1ee8
Docstring for a function
Ladme Nov 10, 2025
9165e2b
Support for LUMI and bug fix for loop jobs with extensions
Ladme Nov 11, 2025
c2f561b
qq wipe
Ladme Nov 11, 2025
b754184
Documentation inconsistency
Ladme Nov 11, 2025
523b4a2
Doc strings for Slurm
Ladme Nov 11, 2025
ca5319a
Exit code to qqout
Ladme Nov 13, 2025
6a008b2
Tests for runner finalize logging
Ladme Nov 13, 2025
a2bbc9a
Better truncating of estimated nodes
Ladme Nov 13, 2025
4623afe
More methods of BatchJobInterface should have an optional return
Ladme Nov 13, 2025
e08c116
More methods of BatchNodeInterface should have an optional return
Ladme Nov 13, 2025
2aba9b0
More methods of BatchQueueInterface should have an optional return
Ladme Nov 13, 2025
5a7320d
Fixed showing GPUs
Ladme Nov 13, 2025
b6e41ab
Recent internal changes in changelog
Ladme Nov 13, 2025
40b27a7
Fixed incorrect archive placement in non-local loop jobs
Ladme Nov 13, 2025
67e7951
Fixed a PermissionError bug
Ladme Nov 13, 2025
f24f92c
Fixing input dir for jobs
Ladme Nov 14, 2025
27c26d4
Making Slurm qq jobs/stat faster & fixing parsing of ngpus
Ladme Nov 14, 2025
12d5c67
PBS: mpiprocs specification
Ladme Nov 14, 2025
9066242
--excluded is now resolved correctly even if the submitted script is …
Ladme Nov 14, 2025
d30640b
Fixed resubmitting jobs not originally submitted from cwd
Ladme Nov 14, 2025
52eedfb
Slurm steps to qq info
Ladme Nov 14, 2025
f273c8c
Copying runtime files from failed and killed jobs
Ladme Nov 14, 2025
3bf72fe
Hiding the column comment in qq queues
Ladme Nov 14, 2025
d43fed7
Max Nodes column in qq queues
Ladme Nov 14, 2025
a9626cb
More dynamic output of qq jobs and qq stat
Ladme Nov 15, 2025
993f1ae
Making operations with job ids faster
Ladme Nov 15, 2025
cb91b55
Typo
Ladme Nov 15, 2025
e06b432
Comment to qq jobs -e and qq stat -e
Ladme Nov 15, 2025
9ce708e
Help changes
Ladme Nov 15, 2025
8898177
--include option
Ladme Nov 15, 2025
4ef892c
Added some missing tests
Ladme Nov 15, 2025
ddb785d
Updated dev status
Ladme Nov 15, 2025
3bdd088
Changelog rewrite
Ladme Nov 15, 2025
c9de0cd
More reliable creation of scratch directories on Karolina and LUMI
Ladme Nov 16, 2025
cdd23e5
Always submitting from input dir
Ladme Nov 16, 2025
1a469a8
Bumped version
Ladme Nov 16, 2025
4251abf
Making run scripts compatible with LUMI
Ladme Nov 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,43 @@
## Version 0.5.0

### Support for LUMI
- qq is now fully compatible with the **LUMI** supercomputer.

### Handling of failed and killed jobs
- The `.err` and `.out` runtime files are now copied from the working directory to the input directory even when a job fails or is killed.
This makes it easier to inspect what went wrong while keeping the input directory in a consistent state — all other files remain in the working directory.

### New command: `qq wipe`
- Added the `qq wipe` command for safely deleting the working directories of failed or killed jobs.

### Slurm step information
- `qq info` now displays the status of individual Slurm job steps when multiple steps exist and the information is available from the batch system.

### Updates to `qq nodes`
- The *Comment* column is now hidden when no queues include a comment.
- Added a new *Max Nodes* column showing the maximum number of nodes that can be requested in each queue. This column is hidden if no queue has a set maximal number of nodes.

### New option: `--include` in `qq submit`
- You can now use the `--include` option to specify additional files or directories outside the job's input directory. These will be copied into the working directory upon submission.

### Bug fixes and minor improvements
- Added support for the `-h` flag as a shorthand for `--help`.
- Added shell autocomplete for qq commands.
- Fixed incorrect naming of loop jobs when the job script had a file extension.
- Made it possible to submit qq jobs from directories other than the current working directory.
- `get_info_files_from_job_id_or_dir` now properly catches `PermissionError` when reading restricted info files.
- Retrieving job lists from Slurm is now significantly faster (still limited by Slurm performance).
- Fixed an issue preventing jobs from using multiple MPI ranks on some PBS clusters.
- Improved the dynamic output of `qq jobs`: unused columns are now hidden.
- Operations on job IDs are now faster.
- Job comments are now shown in the output of `qq jobs -e` and `qq stat -e` (if available).
- `qq sync` now correctly synchronizes contents of selected directories when using the `-f` option.

### Internal changes
- Most methods in `BatchJobInterface`, `BatchQueueInterface`, and `BatchNodeInterface` now have optional return values.

***

## Version 0.4.0

### Support for Slurm
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
[project]
name = "qq"
version = "0.4.0"
version = "0.5.0"
description = "A friendly interface to batch processing"
readme = "README.md"
requires-python = ">=3.12"
license = { text = "MIT" }
authors = [{ name = "Ladislav Bartos", email = "ladmeb@gmail.com" }]
keywords = ["batch", "processing", "cli", "automation"]
classifiers = [
"Development Status :: 3 - Alpha",
"Development Status :: 4 - Beta",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: MIT License",
"Operating System :: POSIX :: Linux",
Expand Down
4 changes: 3 additions & 1 deletion scripts/installation_scripts/qq-install.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash
# Installs qq from a GitHub release into a specific home directory
# and updates that home directory's .bashrc accordingly.
# Script version: 0.2.0
# Script version: 0.3.0

set -euo pipefail

Expand Down Expand Up @@ -107,6 +107,8 @@ qq() {
command qq "\$@"
fi
}
# This makes qq autocomplete work.
eval "$(_QQ_COMPLETE=bash_source qq)"
$BLOCK_END
EOF
}
Expand Down
6 changes: 3 additions & 3 deletions scripts/installation_scripts/qq-karolina-install.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash
# Installs qq on IT4Innovation's Karolina.
# Script version: 0.1.0
# Installs qq on IT4Innovations' Karolina.
# Script version: 0.2.0

set -euo pipefail

Expand Down Expand Up @@ -38,7 +38,7 @@ for HOME_DIR in "${TARGET_HOMES[@]}"; do
if [ -d "$HOME_DIR" ]; then
"$TMP_INSTALLER" "$HOME_DIR" "$RELEASE_URL"
else
echo "WARN [qq sokar installer] Skipping $HOME_DIR (directory not found)"
echo "WARN [qq karolina installer] Skipping $HOME_DIR (directory not found)"
fi
done

Expand Down
50 changes: 50 additions & 0 deletions scripts/installation_scripts/qq-lumi-install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/bash
# Installs qq on the Lumi supercomputer.
# Script version: 0.1.0

set -euo pipefail

# -----------------------
# Configuration
# -----------------------

# qq version to install
QQ_VERSION="v__VERSION__"

# GitHub release assets
INSTALL_SCRIPT_URL="https://github.com/Ladme/qq/releases/download/${QQ_VERSION}/qq-install.sh"
RELEASE_URL="https://github.com/Ladme/qq/releases/download/${QQ_VERSION}/qq-release.tar.gz"

# list of target home directories
TARGET_HOMES=(
"${HOME}"
)

# -----------------------
# Main logic
# -----------------------

TMP_INSTALLER="$(mktemp)"

echo "INFO [qq lumi installer] Downloading qq installer from ${INSTALL_SCRIPT_URL}..."
curl -fsSL -o "$TMP_INSTALLER" "$INSTALL_SCRIPT_URL"
chmod +x "$TMP_INSTALLER"

echo "INFO [qq lumi installer] Installing qq ${QQ_VERSION} from ${RELEASE_URL}"

for HOME_DIR in "${TARGET_HOMES[@]}"; do
echo "--------------------------------------------"
echo "INFO [qq lumi installer] Installing qq into $HOME_DIR ..."
if [ -d "$HOME_DIR" ]; then
"$TMP_INSTALLER" "$HOME_DIR" "$RELEASE_URL"
else
echo "WARN [qq lumi installer] Skipping $HOME_DIR (directory not found)"
fi
done

echo "--------------------------------------------"
echo "INFO [qq lumi installer] qq installation completed for all target home directories."
echo "INFO [qq lumi installer] Run 'source ${HOME}/.bashrc' to make qq available on the current machine."

# Cleanup
rm -f "$TMP_INSTALLER"
4 changes: 3 additions & 1 deletion scripts/installation_scripts/qq-set-scratch.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash
# Sets up .bashrc to sync qq from a specified directory to /scratch/${USER}
# and add both locations to PATH (with /scratch/${USER}/qq prioritized).
# Script version: 0.1.0
# Script version: 0.2.0

set -euo pipefail

Expand Down Expand Up @@ -76,6 +76,8 @@ qq() {
command qq "\$@"
fi
}
# This makes qq autocomplete work.
eval "$(_QQ_COMPLETE=bash_source qq)"
$BLOCK_END
EOF
}
Expand Down
3 changes: 1 addition & 2 deletions scripts/run_scripts/qq_flex_md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
########################################
# Script for running Gromacs #
# flexible-length loop jobs using qq #
# script version: 0.7 #
# script version: 0.8 #
# support: ladmeb@gmail.com #
########################################

Expand Down Expand Up @@ -216,7 +216,6 @@ fi
# run the simulation
if [[ "${QQ_BATCH_SYSTEM}" == *"Slurm"* ]]; then
srun \
--mpi=pmix \
--ntasks=${MPI} \
--cpus-per-task=${NTOMP} \
gmx_mpi mdrun -v -deffnm ${CURR} ${CPI} \
Expand Down
3 changes: 1 addition & 2 deletions scripts/run_scripts/qq_flex_re
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Script for running Gromacs #
# flexible-length replica exchange #
# using qq #
# script version: 0.2 #
# script version: 0.3 #
# support: ladmeb@gmail.com #
########################################

Expand Down Expand Up @@ -272,7 +272,6 @@ IFS=" "
# run the simulation
if [[ "${QQ_BATCH_SYSTEM}" == *"Slurm"* ]]; then
srun \
--mpi=pmix \
--ntasks=${MPI} \
--cpus-per-task=${NTOMP} \
gmx_mpi mdrun -v -deffnm ${CURR} ${CPI} \
Expand Down
3 changes: 1 addition & 2 deletions scripts/run_scripts/qq_loop_md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
########################################
# Script for running Gromacs #
# loop jobs using qq #
# script version: 0.7 #
# script version: 0.8 #
# support: ladmeb@gmail.com #
########################################

Expand Down Expand Up @@ -207,7 +207,6 @@ fi
# run the simulation
if [[ "${QQ_BATCH_SYSTEM}" == *"Slurm"* ]]; then
srun \
--mpi=pmix \
--ntasks=${MPI} \
--cpus-per-task=${NTOMP} \
gmx_mpi mdrun -v -deffnm ${CURR} ${CPI} \
Expand Down
3 changes: 1 addition & 2 deletions scripts/run_scripts/qq_loop_re
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
########################################
# Script for running Gromacs #
# multidir loop jobs using qq #
# script version: 0.1 #
# script version: 0.2 #
# support: ladmeb@gmail.com #
########################################

Expand Down Expand Up @@ -262,7 +262,6 @@ IFS=" "
# run the simulation
if [[ "${QQ_BATCH_SYSTEM}" == *"Slurm"* ]]; then
srun \
--mpi=pmix \
--ntasks=${MPI} \
--cpus-per-task=${NTOMP} \
gmx_mpi mdrun -v -deffnm ${CURR} ${CPI} \
Expand Down
6 changes: 4 additions & 2 deletions src/qq_lib/archive/archiver.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ def archiveRunTimeFiles(self, job_name: str, cycle: int) -> None:

Args:
job_name (str): The name of the job.
cycle (int): Current cycle number (for archiving).
cycle (int): Cycle number for which the files should be archived.

Raises:
QQError: If moving the runtime files fails.
Expand All @@ -154,7 +154,9 @@ def archiveRunTimeFiles(self, job_name: str, cycle: int) -> None:
files := self._getFiles(
self._input_dir,
self._input_machine,
job_name,
# only use the stem of the job name, the extension will not be matched
job_name.split(".", maxsplit=1)[0],
# we do not need to use the cycle number here since the job_name should already be expanded
cycle=None,
include_qq_files=True,
)
Expand Down
1 change: 1 addition & 0 deletions src/qq_lib/batch/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@
from .pbs import PBS, PBSJob, PBSNode, PBSQueue
from .slurm import Slurm, SlurmJob, SlurmNode, SlurmQueue
from .slurmit4i import SlurmIT4I
from .slurmlumi import SlurmLumi
71 changes: 71 additions & 0 deletions src/qq_lib/batch/interface/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -480,6 +480,43 @@ def listRemoteDir(cls, host: str, directory: Path) -> list[Path]:
if line.strip()
]

@classmethod
def deleteRemoteDir(cls, host: str, directory: Path) -> None:
"""
Delete a directory on a remote host.

The default implementation uses SSH to run `rm -r` on the remote host.
This approach may be inefficient on shared storage or high-latency networks.
Note that the timeout for the SSH connection is set to `SSH_TIMEOUT` seconds.

Subclasses should override this method to provide a more efficient implementation
if possible.

Args:
host (str): The hostname of the remote machine where the directory resides.
directory (Path): The remote directory to delete.

Raises:
QQError: If the directory cannot be deleted or the SSH command fails.
"""
result = subprocess.run(
[
"ssh",
"-o PasswordAuthentication=no",
"-o GSSAPIAuthentication=yes",
f"-o ConnectTimeout={CFG.timeouts.ssh}",
host,
f"yes | rm -r {directory}",
],
capture_output=True,
text=True,
)

if result.returncode != 0:
raise QQError(
f"Could not delete remote directory '{directory}' on '{host}': {result.stderr.strip()}."
)

@classmethod
def moveRemoteFiles(
cls, host: str, files: list[Path], moved_files: list[Path]
Expand Down Expand Up @@ -721,6 +758,36 @@ def sortJobs(cls, jobs: list[TBatchJob]) -> None:
"""
jobs.sort(key=lambda job: job.getId())

@classmethod
def jobsPresenterColumnsToShow(cls) -> set[str]:
"""
Get a set of columns that should be shown in the output of JobsPresenter (`qq jobs`)
for this batch system.

In the default implementation, all columns are shown.

Note that the 'Exit' column is not shown when printing queued and running jobs,
even if you specify it here.

Args:
set[str]: Set of column titles that should be shown.
"""
return {
"S",
"Job ID",
"User",
"Job Name",
"Queue",
"NCPUs",
"NGPUs",
"NNodes",
"Times",
"Node",
"%CPU",
"%Mem",
"Exit",
}

@classmethod
def _translateSSHCommand(cls, host: str, directory: Path) -> list[str]:
"""
Expand Down Expand Up @@ -887,7 +954,11 @@ def _translateRsyncIncludedCommand(
"-rltD",
]
for file in relative_included:
# if `file` is a file
command.extend(["--include", str(file)])
# if `file` is a directory
# it's okay to include both patterns - if it is invalid, it's ignored
command.extend(["--include", f"{str(file)}/***"])
# exclude all files not specifically included
command.extend(["--exclude", "*"])

Expand Down
Loading
Loading