Skip to content

Bpipe Version 0.9.9.7

Compare
Choose a tag to compare
@ssadedin ssadedin released this 23 Apr 12:49
· 723 commits to master since this release

Summary

This release includes several major new features including prelimary support for
running Bpipe pipelines on cloud providers (Google Cloud, Amazon Web Services), a
new merge point operator for making it easier to
construct parallel pipelines using scatter-gather parallelism. In addition to these,
significant work has been done to dramatically improve performance and reduce
resource consumption on very highly parallel pipelines with large numbers of
input / output files.

Features

  • Preliminary support for executing pipelines on Google Cloud Services
    (Compute Engine) and mounting storage for pipelines from Google Cloud
    Storage

  • Preliminary support for executing pipelines on Amazon Web Services
    using EC2 and mounting storage for pipelines from S3

  • The 'groovy' command can now run embedded groovy (executed outside
    Bpipe) using the groovy runtime bundled with Bpipe

  • Support aliasing to string values in addition to outputs

  • Experimental support for beforeRun hook in command config: execute
    arbitrary groovy code before a command executes

  • Many performance improvements, esp. for large, highly
    parallel pipelines

  • Support configuration for number of retries for status
    polling of HPC jobs (statusPollRetries setting)

  • Support for 'optional' inputs in pipelines: to make input optional,
    suffix with 'optional'. Also can add 'flag' to add flags
    in commands eg: ${input.csv.optional.flag('--csv')}

  • New operator: merge point operator (>>>) automatically configures a stage
    to merge outputs from a previous parallel split

  • Add region.bedFlag(flag) method for convenience when passing
    regions to commands

  • 'var' expressions may now be added in the main pipeline script,
    not just pipeline stages. These define optional
    variables, and provide a default.

  • JMS support now responds to 'ping' message with 'pong' reply
    if JMS 'Reply-To' is set to allow for status monitoring

Fixes

  • Fix incorrect "abnormal termination" messages
    printed to console when pipeline stopped with 'bpipe stop'

  • Fix incorrect 'pre-existing' printed for outputs that were
    created by pipeline

  • Fix genome not accessible in pipeline the first time downloaded,
    printing error

  • Re-execute checks if a commmand in the same stage has executed

  • synchronize initialization of dir watcher to fix sporadic
    ConcurrentModificationExceptions

  • Fix empty embedded parallel stage list causing resolution of incorrect
    downstream input

  • Fix leak of 'var' variables across branches when 'using' applied to
    pipeline stage

  • Fix error if 4 or more arguments passed to "to" in transform

  • Fix bpipe complaining spurious outputs not created on retry,
    but not original run

  • Fix some bugs where branch names were not being observed

  • Fix branch name sometimes inserted without separating period for transforms

  • Avoid redundantly putting branch name into files

  • Improved detail in error / log messages in a few places

  • Fix missing branch and '..' in filenames

  • Change: globally defined variables must now be held constant
    once pipeline starts

  • Fix split regions not stable between runs, set region id as branch
    name

  • Fix bed.split producing different splits if run repeatedly on same bed

  • Fix errors output if SLF4J referenced in user loaded libraries

  • Fix npe / improve error message when filter used with mismatching output ext

  • Fix error in stage body resulting in confusing 'no associated storage'
    assertion failure

  • Add 'allowForeign' option to 'from' to let it process non-outputs

  • Lessen the retries and retry interval when file cannot be cleaned
    up