Releases: JaneliaSciComp/py-cluster-api
0.3.0
Release Notes — v0.3.0
Breaking Changes
job_name_prefix is now truly optional. Previously, omitting the prefix caused a random 5-character prefix to be auto-generated. Now, when no prefix is configured, job names are submitted without a prefix and bjobs queries are not filtered by name. Clients that relied on the implicit random prefix for isolation between concurrent sessions should now set an explicit job_name_prefix in their config. reconnect() still requires a prefix to be set.
Unknown config profiles now raise ValueError. Requesting a profile name that doesn't exist in the YAML file previously failed silently (the profile was just ignored). It now raises with a message listing available profiles. Clients passing speculative profile names will need to handle this or ensure the profile exists.
Array job aggregate status: FAILED now takes precedence over KILLED. When an array job has a mix of failed and killed elements, the overall status is now FAILED rather than KILLED. This better reflects that something went wrong rather than just being cancelled.
New Features
done flag on cancel() and cancel_all(). Pass done=True to mark cancelled jobs as DONE instead of KILLED. On LSF this translates to bkill -d. Useful for gracefully retiring jobs you don't want flagged as killed.
Bug Fixes
- Paths with spaces no longer break LSF submission. Stdout, stderr, and
cwdpaths in#BSUBdirectives are now double-quoted. Thebsubinvocation itself switched from shell execution (create_subprocess_shell) tocreate_subprocess_execwith stdin file redirection, eliminating shell-escaping issues. cancel_by_name()no longer raises when no jobs match. LSF's "No matching job" / "No unfinished job" errors are now caught and logged at debug level instead of propagating.reconnect()skips completed jobs. Terminal (DONE/EXIT) jobs returned bybjobs -aare now filtered out so reconnect only picks up jobs that are still running or pending.- Zombie processes reaped after local cancel. The local executor now properly awaits killed child processes to avoid zombies.
Improvements
- Faster local array job cancellation. SIGTERM is now sent to all element processes concurrently, with a single collective wait, instead of terminating and waiting on each one sequentially.
- Warning when using
max_concurrentwithLocalExecutor. Since the local executor doesn't support concurrency throttling, it now logs a warning instead of silently ignoring the parameter. - Cancel logic refactored into
_cancel_job(). Cancellation is now a proper overridable method on the baseExecutor, making it easier to customize in subclasses. - Comprehensive docstrings added to
Executor,ResourceSpec, and internal methods. Newdocs/API.mdreference document.
0.2.4
Fix to convert timestamps to UTC.
Full Changelog: 0.2.3...0.2.4
0.2.3
- Sanitize job names
Full Changelog: 0.2.2...0.2.3
0.2.2
New Features
- Job reconnection — LSFExecutor.reconnect() queries bjobs by
name prefix and reconstructs JobRecord instances so
monitoring can resume after a process restart. Supports both
single jobs and array jobs. Requires job_name_prefix to be
set in config. - Debug logging — Added logger.debug() calls for all
scheduler CLI invocations (bsub, bkill, bjobs) so command
lines are visible at DEBUG level. Info-level logging for
submit, cancel, and reconnect outcomes.
Improvements
- Simplified bsub submission — Removed the use_stdin code
path; script submission now always uses shell redirection
(bsub < script.sh), which correctly enables #BSUB directive
parsing.
Cleanup
- Removed unused lsf_detect_units() function (dead code,
never called). - Removed unused register_executor() function from the
executor registry. - Deduplicated _ARRAY_ELEMENT_RE regex — LocalExecutor now
imports the pattern from core.py instead of defining its own
copy.
Full Changelog: 0.2.1...0.2.2
0.2.1
New Features
- extra_args on config and ResourceSpec — extra CLI arguments
appended to the submit command (e.g. bsub -P myproject
script.sh). Useful for flags that must appear on the command
line rather than as script directives. Supports both
config-level (all jobs) and per-job via ResourceSpec. - Real-time log output — LocalExecutor now streams
stdout/stderr directly to log files during execution instead
of buffering until completion.
Breaking Changes
- ResourceSpec.account removed — use extra_args=["-P",
"myproject"] or extra_directives=["-P myproject"] instead. - ResourceSpec.cluster_options renamed to extra_directives —
aligns with the same field on ClusterConfig. Both now
auto-prepend the directive prefix (#BSUB), so values should
be raw flags (e.g. "-G mygroup") not full directives. - ClusterConfig.account removed — same as above.
- ClusterConfig.extra_directives no longer requires the #BSUB
prefix — the executor prepends it automatically. Change
"#BSUB -env 'all'" to "-env 'all'".
0.2.0
New Features
- LocalExecutor array job support — submit_array() now spawns
one subprocess per array element, each with an ARRAY_INDEX
environment variable. Poll, cancel, and output file writing
all handle array elements correctly. Per-element log files
use the pattern stdout.{job_id}.{index}.log /
stderr.{job_id}.{index}.log.
Breaking Changes
- ResourceSpec.work_dir is now required (non-nullable) —
changed from str | None (default None) to str (default
os.getcwd()). Code passing work_dir=None explicitly will
break; code omitting it will get cwd automatically. - ResourceSpec gained stdout_path / stderr_path fields — new
optional fields for overriding default log file paths. Not
breaking per se, but changes the dataclass shape. - ClusterConfig.log_directory removed — log files are now
written to ResourceSpec.work_dir instead of a separate global
log directory. Config files with log_directory will trigger
an "unknown config keys" warning. - Executor._submit_job() signature changed — now takes
(command, name, resources, prologue, epilogue, env, *, cwd)
and returns tuple[str, str | None] instead of taking
(script_path, name, env) and returning str. Custom executor
subclasses must be updated. - Executor._submit_array_job() signature changed — same
pattern as _submit_job(), now receives full submission
parameters and returns tuple[str, str | None]. - Executor.render_script() and Executor.build_header()
removed from base class — script rendering moved to
standalone render_script() / write_script() functions in the
new cluster_api.script module. build_header() is now only on
executor subclasses, not abstract on the base. - Executor._write_script() removed — replaced by
cluster_api.script.write_script(). Scripts are now written to
work_dir instead of log_directory. - Executor.directive_prefix removed from base class — only
defined on subclasses that need it. - Log file naming changed — LSF logs changed from
{name}.out/{name}.err to stdout.%J.log/stderr.%J.log; local
logs changed from {name}.out/{name}.err to
stdout.{job_id}.log/stderr.{job_id}.log.
Bug Fixes
- Race condition in JobMonitor.wait_for() — completion events
are now registered before checking is_terminal, preventing a
hang when a job finishes between the check and registration. - Non-atomic script counter — LSFExecutor and LocalExecutor
now use itertools.count() instead of manual += 1 increment,
avoiding counter collisions on concurrent submits.
Improvements
- Unknown config keys now logged — load_config() emits a
warning for unrecognized keys instead of silently ignoring
them. - Unmapped LSF statuses now logged — unknown bjobs status
strings trigger a warning instead of silently mapping to
UNKNOWN. Added mappings for UNKWN, WAIT, and PROV. - Better poll error logging — failed status queries now
include the exception message and traceback in the warning. - Script rendering extracted to cluster_api.script module —
render_script() and write_script() are now standalone
functions, decoupled from the executor class hierarchy.
0.1.1
Full Changelog: 0.1.0...0.1.1