Bpipe Version 0.9.9.7
Summary
This release includes several major new features including prelimary support for
running Bpipe pipelines on cloud providers (Google Cloud, Amazon Web Services), a
new merge point operator for making it easier to
construct parallel pipelines using scatter-gather parallelism. In addition to these,
significant work has been done to dramatically improve performance and reduce
resource consumption on very highly parallel pipelines with large numbers of
input / output files.
Features
-
Preliminary support for executing pipelines on Google Cloud Services
(Compute Engine) and mounting storage for pipelines from Google Cloud
Storage -
Preliminary support for executing pipelines on Amazon Web Services
using EC2 and mounting storage for pipelines from S3 -
The 'groovy' command can now run embedded groovy (executed outside
Bpipe) using the groovy runtime bundled with Bpipe -
Support aliasing to string values in addition to outputs
-
Experimental support for beforeRun hook in command config: execute
arbitrary groovy code before a command executes -
Many performance improvements, esp. for large, highly
parallel pipelines -
Support configuration for number of retries for status
polling of HPC jobs (statusPollRetries setting) -
Support for 'optional' inputs in pipelines: to make input optional,
suffix with 'optional'. Also can add 'flag' to add flags
in commands eg: ${input.csv.optional.flag('--csv')} -
New operator: merge point operator (>>>) automatically configures a stage
to merge outputs from a previous parallel split -
Add region.bedFlag(flag) method for convenience when passing
regions to commands -
'var' expressions may now be added in the main pipeline script,
not just pipeline stages. These define optional
variables, and provide a default. -
JMS support now responds to 'ping' message with 'pong' reply
if JMS 'Reply-To' is set to allow for status monitoring
Fixes
-
Fix incorrect "abnormal termination" messages
printed to console when pipeline stopped with 'bpipe stop' -
Fix incorrect 'pre-existing' printed for outputs that were
created by pipeline -
Fix genome not accessible in pipeline the first time downloaded,
printing error -
Re-execute checks if a commmand in the same stage has executed
-
synchronize initialization of dir watcher to fix sporadic
ConcurrentModificationExceptions -
Fix empty embedded parallel stage list causing resolution of incorrect
downstream input -
Fix leak of 'var' variables across branches when 'using' applied to
pipeline stage -
Fix error if 4 or more arguments passed to "to" in transform
-
Fix bpipe complaining spurious outputs not created on retry,
but not original run -
Fix some bugs where branch names were not being observed
-
Fix branch name sometimes inserted without separating period for transforms
-
Avoid redundantly putting branch name into files
-
Improved detail in error / log messages in a few places
-
Fix missing branch and '..' in filenames
-
Change: globally defined variables must now be held constant
once pipeline starts -
Fix split regions not stable between runs, set region id as branch
name -
Fix bed.split producing different splits if run repeatedly on same bed
-
Fix errors output if SLF4J referenced in user loaded libraries
-
Fix npe / improve error message when filter used with mismatching output ext
-
Fix error in stage body resulting in confusing 'no associated storage'
assertion failure -
Add 'allowForeign' option to 'from' to let it process non-outputs
-
Lessen the retries and retry interval when file cannot be cleaned
up