Skip to content

Releases: JaneliaSciComp/py-cluster-api

0.3.0

03 Mar 02:15

Choose a tag to compare

Release Notes — v0.3.0

Breaking Changes

job_name_prefix is now truly optional. Previously, omitting the prefix caused a random 5-character prefix to be auto-generated. Now, when no prefix is configured, job names are submitted without a prefix and bjobs queries are not filtered by name. Clients that relied on the implicit random prefix for isolation between concurrent sessions should now set an explicit job_name_prefix in their config. reconnect() still requires a prefix to be set.

Unknown config profiles now raise ValueError. Requesting a profile name that doesn't exist in the YAML file previously failed silently (the profile was just ignored). It now raises with a message listing available profiles. Clients passing speculative profile names will need to handle this or ensure the profile exists.

Array job aggregate status: FAILED now takes precedence over KILLED. When an array job has a mix of failed and killed elements, the overall status is now FAILED rather than KILLED. This better reflects that something went wrong rather than just being cancelled.

New Features

done flag on cancel() and cancel_all(). Pass done=True to mark cancelled jobs as DONE instead of KILLED. On LSF this translates to bkill -d. Useful for gracefully retiring jobs you don't want flagged as killed.

Bug Fixes

  • Paths with spaces no longer break LSF submission. Stdout, stderr, and cwd paths in #BSUB directives are now double-quoted. The bsub invocation itself switched from shell execution (create_subprocess_shell) to create_subprocess_exec with stdin file redirection, eliminating shell-escaping issues.
  • cancel_by_name() no longer raises when no jobs match. LSF's "No matching job" / "No unfinished job" errors are now caught and logged at debug level instead of propagating.
  • reconnect() skips completed jobs. Terminal (DONE/EXIT) jobs returned by bjobs -a are now filtered out so reconnect only picks up jobs that are still running or pending.
  • Zombie processes reaped after local cancel. The local executor now properly awaits killed child processes to avoid zombies.

Improvements

  • Faster local array job cancellation. SIGTERM is now sent to all element processes concurrently, with a single collective wait, instead of terminating and waiting on each one sequentially.
  • Warning when using max_concurrent with LocalExecutor. Since the local executor doesn't support concurrency throttling, it now logs a warning instead of silently ignoring the parameter.
  • Cancel logic refactored into _cancel_job(). Cancellation is now a proper overridable method on the base Executor, making it easier to customize in subclasses.
  • Comprehensive docstrings added to Executor, ResourceSpec, and internal methods. New docs/API.md reference document.

0.2.4

24 Feb 02:43

Choose a tag to compare

Fix to convert timestamps to UTC.

Full Changelog: 0.2.3...0.2.4

0.2.3

14 Feb 01:46

Choose a tag to compare

  • Sanitize job names

Full Changelog: 0.2.2...0.2.3

0.2.2

14 Feb 01:11

Choose a tag to compare

New Features

  • Job reconnection — LSFExecutor.reconnect() queries bjobs by
    name prefix and reconstructs JobRecord instances so
    monitoring can resume after a process restart. Supports both
    single jobs and array jobs. Requires job_name_prefix to be
    set in config.
  • Debug logging — Added logger.debug() calls for all
    scheduler CLI invocations (bsub, bkill, bjobs) so command
    lines are visible at DEBUG level. Info-level logging for
    submit, cancel, and reconnect outcomes.

Improvements

  • Simplified bsub submission — Removed the use_stdin code
    path; script submission now always uses shell redirection
    (bsub < script.sh), which correctly enables #BSUB directive
    parsing.

Cleanup

  • Removed unused lsf_detect_units() function (dead code,
    never called).
  • Removed unused register_executor() function from the
    executor registry.
  • Deduplicated _ARRAY_ELEMENT_RE regex — LocalExecutor now
    imports the pattern from core.py instead of defining its own
    copy.

Full Changelog: 0.2.1...0.2.2

0.2.1

13 Feb 19:12

Choose a tag to compare

New Features

  • extra_args on config and ResourceSpec — extra CLI arguments
    appended to the submit command (e.g. bsub -P myproject
    script.sh). Useful for flags that must appear on the command
    line rather than as script directives. Supports both
    config-level (all jobs) and per-job via ResourceSpec.
  • Real-time log output — LocalExecutor now streams
    stdout/stderr directly to log files during execution instead
    of buffering until completion.

Breaking Changes

  • ResourceSpec.account removed — use extra_args=["-P",
    "myproject"] or extra_directives=["-P myproject"] instead.
  • ResourceSpec.cluster_options renamed to extra_directives —
    aligns with the same field on ClusterConfig. Both now
    auto-prepend the directive prefix (#BSUB), so values should
    be raw flags (e.g. "-G mygroup") not full directives.
  • ClusterConfig.account removed — same as above.
  • ClusterConfig.extra_directives no longer requires the #BSUB
    prefix — the executor prepends it automatically. Change
    "#BSUB -env 'all'" to "-env 'all'".

0.2.0

11 Feb 15:27

Choose a tag to compare

New Features

  • LocalExecutor array job support — submit_array() now spawns
    one subprocess per array element, each with an ARRAY_INDEX
    environment variable. Poll, cancel, and output file writing
    all handle array elements correctly. Per-element log files
    use the pattern stdout.{job_id}.{index}.log /
    stderr.{job_id}.{index}.log.

Breaking Changes

  • ResourceSpec.work_dir is now required (non-nullable) —
    changed from str | None (default None) to str (default
    os.getcwd()). Code passing work_dir=None explicitly will
    break; code omitting it will get cwd automatically.
  • ResourceSpec gained stdout_path / stderr_path fields — new
    optional fields for overriding default log file paths. Not
    breaking per se, but changes the dataclass shape.
  • ClusterConfig.log_directory removed — log files are now
    written to ResourceSpec.work_dir instead of a separate global
    log directory. Config files with log_directory will trigger
    an "unknown config keys" warning.
  • Executor._submit_job() signature changed — now takes
    (command, name, resources, prologue, epilogue, env, *, cwd)
    and returns tuple[str, str | None] instead of taking
    (script_path, name, env) and returning str. Custom executor
    subclasses must be updated.
  • Executor._submit_array_job() signature changed — same
    pattern as _submit_job(), now receives full submission
    parameters and returns tuple[str, str | None].
  • Executor.render_script() and Executor.build_header()
    removed from base class — script rendering moved to
    standalone render_script() / write_script() functions in the
    new cluster_api.script module. build_header() is now only on
    executor subclasses, not abstract on the base.
  • Executor._write_script() removed — replaced by
    cluster_api.script.write_script(). Scripts are now written to
    work_dir instead of log_directory.
  • Executor.directive_prefix removed from base class — only
    defined on subclasses that need it.
  • Log file naming changed — LSF logs changed from
    {name}.out/{name}.err to stdout.%J.log/stderr.%J.log; local
    logs changed from {name}.out/{name}.err to
    stdout.{job_id}.log/stderr.{job_id}.log.

Bug Fixes

  • Race condition in JobMonitor.wait_for() — completion events
    are now registered before checking is_terminal, preventing a
    hang when a job finishes between the check and registration.
  • Non-atomic script counter — LSFExecutor and LocalExecutor
    now use itertools.count() instead of manual += 1 increment,
    avoiding counter collisions on concurrent submits.

Improvements

  • Unknown config keys now logged — load_config() emits a
    warning for unrecognized keys instead of silently ignoring
    them.
  • Unmapped LSF statuses now logged — unknown bjobs status
    strings trigger a warning instead of silently mapping to
    UNKNOWN. Added mappings for UNKWN, WAIT, and PROV.
  • Better poll error logging — failed status queries now
    include the exception message and traceback in the warning.
  • Script rendering extracted to cluster_api.script module —
    render_script() and write_script() are now standalone
    functions, decoupled from the executor class hierarchy.

0.1.1

09 Feb 04:23

Choose a tag to compare

Full Changelog: 0.1.0...0.1.1

0.1.0

09 Feb 04:23

Choose a tag to compare

Initial release