Releases: openpipelines-bio/openpipeline
OpenPipelines.bio v0.12.2
BUG FIXES
dataflow/concat
anddataflow/concatenate_h5mu
: Fix an issue where using--mode move
on samples with non-overlapping features would causevar_names
to become unaligned to the data (PR #653).
OpenPipelines.bio v0.11.1
BUG FIXES
dataflow/concat
: Fix an issue where using--mode move
on samples with non-overlapping features would causevar_names
to become unaligned to the data (PR #653).
OpenPipelines.bio v0.12.1
BUG FIXES
- rna_singlesample: Fix filtering parameters values min_counts, max_counts, min_genes_per_cell, max_genes_per_cell and min_cells_per_gene not being passed to the filter_with_counts component (PR #614).
- prot_singlesample: Fix filtering parameters values min_counts, max_counts, min_proteins_per_cell, max_proteins_per_cell and min_cells_per_protein not being passed to the filter_with_counts component (PR #614).
OpenPipelines.bio v0.12.0
BREAKING CHANGES
The detection of mitochondrial genes has been revisited in order to remove the interdependency with the count filtering and the QC metric calculation.
Implementing this changes involved breaking some existing functionality:
-
filter/filter_with_counts
: removed--var_gene_names
,--mitochondrial_gene_regex
,--var_name_mitochondrial_genes
,--min_fraction_mito
and--max_fraction_mito
(PR #585). -
workflows/prot_singlesample
: removed--min_fraction_mito
and--max_fraction_mito
because regex-based detection detection of mitochondrial genes is not possible (PR #585). -
The fraction of counts that originated from mitochondrial genes used to be written to an .obs column with a name that was derived from
pct_
suffixed by the name of the mitochondrial gene column. The--obs_name_mitochondrial_fraction
argument is introduced to change the destination column and the default prefix has changed frompct_
tofraction_
(PR #585).
NEW FUNCTIONALITY
-
workflows/qc
: A pipeline to add basic qc statistics to a MuData object (PR #585). -
workflows/rna_singlesample
: added--obs_name_mitochondrial_fraction
and make sure that the values from--max_fraction_mito
and--min_fraction_mito
are bound between 0 and 1 (PR #585). -
Added
filter/delimit_fraction
: Turns an annotation column containing values between 0 and 1 into a boolean column based on thresholds (PR #585). -
Added
metadata/grep_annotation_column
: Perform a regex lookup on a column from the annotation matrices .obs or .var (PR #585). -
workflows/full_pipelines
: added--obs_name_mitochondrial_fraction
argument (PR #585). -
workflows/prot_multisample
: added--var_qc_metrics
and--top_n_vars
arguments (PR #585).
MINOR CHANGES
OpenPipelines.bio v0.11.0
BREAKING CHANGES
-
Nextflow VDSL3: set
simplifyOutput
toFalse
by default. This implies that components and workflows will output a hashmap with a sole "output" entry when there is only one output (PR #563). -
integrate/scvi
: renamemodel_output
argument tooutput_model
in order to align with thescvi_leiden
workflow. This also fixes a bug with the workflow where the argument did not function (PR #562).
MINOR CHANGES
-
dataflow/concat
: reduce memory consumption when using--other_axis_mode move
by processing only one annotation matrix (.var
,.obs
) at a time (PR #569). -
convert/from_h5ad_to_h5mu
,convert/from_h5mu_to_h5ad
,dimred/pca
,dimred/umap/
,
filter/filter_with_counts
,filter/filter_with_hvg
,filter/remove_modality
,filter/subset_h5mu
,
integrate/scanorama
,transform/delete_layer
andtransform/log1p
: update python to3.9
(PR #572). -
integrate/scarches
: update base image,scvi-tools
andpandas
tonvcr.io/nvidia/pytorch:23.09-py3
,~=1.0.3
and~=2.1.0
respectively (PR #572). -
integrate/totalvi
: update python to 3.9 and scvi-tools to~=1.0.3
(PR #572). -
correction/cellbender_remove_background
: change base image tonvcr.io/nvidia/cuda:11.8.0-devel-ubuntu22.04
and downwgrade MuData to 0.2.1 because it is the oldest version that uses python 3.7 (PR #575). -
Several integration workflows: prevent leiden from being executed when no resolutions are provided (PR #583).
-
dataflow/concat
: bump pandas to ~=2.1.1 and reduce memory consumption by only reading one modality into memory at a time (PR #568). -
annotate/popv
: bumpjax
andjaxlib
to0.4.10
, scanpy to1.9.4
, scvi to1.0.3
and pinml-dtypes
to < 0.3.0 (PR #565). -
velocity/scvelo
: pin matplotlib to < 3.8.0 (PR #566). -
mapping/multi_star
: pin multiqc to 1.15.0 (PR #566). -
mapping/bd_rhapsody
: pin pandas version to <2 (PR #563). -
query/cellxgene_census
: replaced labelsinglecpu
with labelmidcpu
. -
query/cellxgene_census
: avoid creating MuData object in memory by writing the modality directly to disk (PR #558). -
integrate/scvi
: usemidcpu
label instead ofsinglecpu
(PR #561).
BUG FIXES
-
transform/clr
: raise an error when CLR fails to return the requested output (PR #579). -
correction/cellbender_remove_background
: fix missing helper functionality when using Fusion (PR #575). -
convert/from_bdrhap_to_h5mu
: AvoidTypeError: Can't implicitly convert non-string objects to strings
by using categorical dtypes when a string column contains NA values (PR #563). -
qc/calculate_qc_metrics
: fix calculating mitochondrial gene related QC metrics when only or no mitochondrial genes were found (PR #564).
OpenPipelines.bio v0.10.0
BREAKING CHANGES
-
workflows/full_pipeline
: removed--prot_min_fraction_mito
and--prot_max_fraction_mito
(PR #451) -
workflows/rna_multisample
andworkflows/prot_multisample
: Removed concatenation from these pipelines. The input for these pipelines is now a single mudata file that contains data for multiple samples. If you wish to use this pipeline on multiple single-sample mudata files, you can use thedataflow/concat
components on them first. This also implies that the ability to add ids to multiple single-sample mudata files prior to concatenation is no longer required, hence the removal of--add_id_to_obs
,--sample_id
,--add_id_obs_output
, and--add_id_make_observation_keys_unique
(PR #475). -
The
scvi
pipeline was renamed toscvi_leiden
becauseleiden
clustering was added to the pipeline (PR #499). -
Upgrade
correction/cellbender_remove_background
from CellBender v0.2 to CellBender v0.3.0 (PR #523).
Between these versions, several arguments related to the slots of the output file have been changed.
MAJOR CHANGES
-
Several components: update anndata to 0.9.3 and mudata to 0.2.3 (PR #423).
-
Base resources assigned for a process without any labels is now 1 CPU and 2GB (PR #518).
-
Updated to Viash 0.7.5 (PR #513).
-
Removed deprecated
variant: vdsl3
tags (PR #513). -
Removed unused
version: dev
(PR #513). -
multiomics/integration/harmony_leiden
: Refactored data flow (PR #513). -
ingestion/bd_rhapsody
: Refactored data flow (PR #513). -
query/cellxgene_census
: increased returned metadata content, revised query option, added filtering strategy and refactored functionality (PR #520). -
Refactor loggers using
setup_logger()
helper function (PR #534). -
Refactor unittest tests to pytest tests (PR #534).
MINOR CHANGES
-
Add resource labels to several components (PR #518).
-
full_pipeline
: default value for--var_qc_metrics
is now the combined values specified for--mitochondrial_gene_regex
and--filter_with_hvg_var_output
. -
dataflow/concat
: reduce memory consumption by only reading one modality at the same time (PR #474). -
Components that use CellRanger, BCL Convert or bcl2fastq: updated from Ubuntu 20.04 to Ubuntu 22.04 (PR #494).
-
Components that use CellRanger: updated Picard to 2.27.5 (PR #494).
-
interprete/liana
: Update lianapy to 0.1.9 (PR #497). -
qc/multiqc
: add unittests (PR #502). -
reference/build_cellranger_reference
: add unit tests (PR #506). -
reference/build_bd_rhapsody_reference
: add unittests (PR #504).
NEW FUNCTIONALITY
-
Added
compression/compress_h5mu
component (PR #530). -
Resource management: when a process exits with a status code between 137 and 140, retry the process with increased memory requirements. Memory scales by multiplying the base memory assigned to the process with the attempt number (PR #518 and PR #527).
-
integrate/scvi
: Add--n_hidden_nodes
,--n_dimensions_latent_space
,--n_hidden_layers
,--dropout_rate
,--dispersion
,--gene_likelihood
,--use_layer_normalization
,--use_batch_normalization
,--encode_covariates
,--deeply_inject_covariates
and--use_observed_lib_size
parameters. -
filter/filter_with_counts
: add--var_name_mitochondrial_genes
argument to store a boolean array corresponding the detected mitochondrial genes. -
full_pipeline
andrna_singlesample
pipelines: add--var_name_mitochondrial_genes
,--var_gene_names
and--mitochondrial_gene_regex
arguments to specify mitochondrial gene detection behaviour. -
integrate/scvi
: Add--obs_labels
,--obs_size_factor
,--obs_categorical_covariate
and--obs_continuous_covariate
arguments (PR #496). -
Added
var_qc_metrics_fill_na_value
argument tocalculate_qc_metrics
(PR #477). -
Added
multiomics/multisample
pipeline to run multisample processing followed by the integration setup. It is considered an entrypoint into the full pipeline which skips the single-sample processing. The idea is to allow a a re-run of these steps after a sample has already been processed by thefull_pipeline
. Keep in mind that samples that are provided as input to this pipeline are processed separately and are not concatenated. Hence, the input should be a concatenated sample (PR #475). -
Added
multiomics/integration/bbknn_leiden
workflow. (PR #456). -
workflows/prot_multisample
andworkflows/full_pipelines
: add basic QC statistics to prot modality (PR #485). -
mapping/cellranger_multi
: Add tests for the mapping of Crispr Guide Capture data (PR #494). -
convert/from_cellranger_multi_to_h5mu
: addperturbation_efficiencies_by_feature
andperturbation_efficiencies_by_feature
information to .uns slot ofgdo
modality (PR #494). -
convert/from_cellranger_multi_to_h5mu
: addfeature_reference
information to the MuData object. Information is split between the modalities. For exampleCRISPR Guide Capture
information if added to the.uns
slot of thegdo
modality, whileAntibody Capture
information is added to the .uns slot ofprot
(PR #494). -
Added
multiomics/integration/totalvi_leiden
pipeline (PR #500). -
Added totalVI component (PR #386).
-
workflows/full_pipeline
: Addpca_overwrite
argument (PR #511). -
Add
main_build_viash_hub
action to build, tag, and push components and docker images for viash-hub.com (PR #480). -
integration/bbknn_leiden
: Update state management tofromState
/toState
(PR #538).
DOCUMENTATION
-
images
: Added images for various concepts, such as a sample, a cell, RNA, ADT, ATAC, VDJ (PR #515). -
multiomics/rna_singlesample
: Add image for workflow (PR #515). -
multiomics/rna_multisample
: Add image for workflow (PR #515). -
multiomics/prot_singlesample
: Add image for workflow (PR #515). -
multiomics/prot_multisample
: Add image for workflow (PR #515).
BUG FIXES
-
Fix an issue with
workflows/multiomics/scanorama_leiden
where the--output
argument doesn't work as expected (PR #509). -
Fix an issue with
workflows/full_pipeline
not correctly caching previous runs (PR #460). -
Fix incorrect namespaces of the integration pipelines (PR #464).
-
Fix an issue in several workflows where the
--output
argument would not work (PR #476). -
integration/harmony_leiden
andintegration/scanorama_leiden
: Fix an issue where the prefix of the columns that store the leiden clusters was hardcoded toleiden
, instead of adapting to the value for--obs_cluster
(PR #482). -
velocity/velocyto
: Resolve symbolic link before checking whether the transcriptome is a gzip (PR #484). -
workflows/integration/scanorama_leiden
: fix an issue where--obsm_input
, --obs_batch,
--batch_size,
--sigma,
--approx,
--alphaand
-knn` were not working beacuse they were not passed through to the scanorama component (PR #487). -
workflows/integration/scanorama_leiden
: fix leiden being calculated on the wrong embedding because the--obsm_input
argument was not correctly set to the output embedding of scanorama (PR #487). -
mapping/cellranger_multi
: Fix and issue where modalities did not have the proper name (PR #494). -
metadata/add_uns_to_obs
: FixKeyError: 'ouput_compression'
error (PR #501). -
neighbors/bbknn
: Fix--input
not being a required argument (PR #518). -
Create
correction/cellbender_remove_background_v0.2
for legacy CellBender v0.2 format (PR #523). -
integrate/scvi
: Ensure output has the same dimensionality as the input (PR #524). -
mapping/bd_rhapsody
: Fix--dryrun
argument not working (PR #534). -
qc/multiqc
: Fix component not working for multiple inputs (PR #537). Also converted Bash script to Python scripts. -
neighbors/bbknn
: Fix--uns_output
,--obsp_distances
and--obsp_connectivities
not being processed correctly (PR #538).
OpenPipelines.bio v0.10.1
MINOR CHANGES
BUG FIXES
-
integration/bbknn_leiden
: Set leiden clustering parameter to multiple (#542, PR #545). -
integration/scvi_leiden
: Fix component name in Viash config (PR #547). -
integration/harmony_leiden
: Pass--uns_neighbors
argumentumap
(PR #548). -
Add workaround for bug where resources aren't available when using Nextflow fusion by including
setup_logger
,subset_vars
andcompress_h5mu
in the script itself (PR #549).
0.9.0
Openpipelines 0.9.0
BREAKING CHANGES
Running the integration in the full_pipeline
deemed to be impractical because a plethora of integration methods exist, which in turn interact with other functionality (like clustering). This generates a large number of possible usecases which one pipeline cannot cover in an easy manner. Instead, each integration methods will be split into its separate pipeline, and the full_pipeline
will prepare for integration by performing steps that are required by many integration methods. Therefore, the following changes were performed:
-
workflows/full_pipeline
:harmony
integration andleiden
clustering are removed from the pipeline. -
Added
initialize_integration
to run calculations that output information commonly required by the integration methods. This pipeline runs PCA, nearest neighbours and UMAP. This pipeline is run as a subpipeline at the end offull_pipeline
. -
Added
leiden_harmony
integration pipeline: run harmony integration followed by neighbour calculations and leiden clustering. Also runs umap on the result. -
Removed the
integration
pipeline.
The old behavior of the full_pipeline
can be obtained by running full_pipeline
followed by the leiden_harmony
pipeline.
-
The
crispr
andhashing
modalities have been renamed togdo
andhto
respectively (PR #392). -
Updated Viash to 0.7.4 (PR #390).
-
cluster/leiden
: Output is now stored into.obsm
instead of.obs
(PR #431).
NEW FUNCTIONALITY
-
cluster/leiden
andintegration/harmony_leiden
: allow running leiden multiple times with multiple resolutions (PR #431). -
workflows/full_pipeline
: PCA, nearest neighbours and UMAP are now calculated for theprot
modality (PR #396). -
transform/clr
: addedoutput_layer
argument (PR #396). -
workflows/integration/scvi
: Run scvi integration followed by neighbour calculations and run umap on the result (PR #396). -
mapping/cellranger_multi
andworkflows/ingestion/cellranger_multi
: Added--vdj_inner_enrichment_primers
argument (PR #417). -
metadata/move_obsm_to_obs
: Move a matrix from an.obsm
slot into.obs
(PR #431). -
integrate/scvi
validity checks for non-normalized input, obs and vars in order to proceed to training (PR #429). -
schemas
: Added schema files for authors (PR #436). -
schemas
: Added schema file for Viash configs (PR #436). -
schemas
: Refactor author import paths (PR #436). -
schemas
: Added schema file for file format specification files (PR #437). -
query/cellxgene_census
: Query Cellxgene census component and save the results to a MuData file. (PR #433).
MAJOR CHANGES
-
report/mermaid
: Now usedmermaid-cli
to generate images instead of creating a request tomermaid.ink
. New--output_format
,--width
,--height
and--background_color
arguments were added (PR #419). -
All components that used
python
as base container: useslim
version to reduce container image size (PR #427).
MINOR CHANGES
-
integrate/scvi
: update scvi to 1.0.0 (PR #448) -
mapping/multi_star
: Added--min_success_rate
which causes component to fail when the success rate of processed samples were successful (PR #408). -
correction/cellbender_remove_background
andtransform/clr
: update muon to 0.1.5 (PR #428) -
ingestion/cellranger_postprocessing
: split integration tests into several workflows (PR #425). -
schemas
: Add schema file for author yamls (PR #436). -
mapping/multi_star
,mapping/star_build_reference
andmapping/star_align
: update STAR from 2.7.10a to 2.7.10b (PR #441).
BUG FIXES
-
annotate/popv
: Fix concat issue when the input data has multiple layers (#395, PR #397). -
annotate/popv
: Fix indexing issue when MuData object contain non overlapping modalities (PR #405). -
mapping/multi_star
: Fix issue where temp dir could not be created when group_id contains slashes (PR #406). -
mapping/multi_star_to_h5mu
: Use glob to look for count files recursively (PR #408). -
annotate/popv
: PinPopV
,jax
andjaxlib
versions (PR #415). -
integrate/scvi
: the max_epochs is no longer required since it has a default value (PR #396). -
workflows/full_pipeline
: fixmake_observation_keys_unique
parameter not being correctly passed to theadd_id
component, causingValueError: Observations are not unique across samples
during execution of theconcat
component (PR #422). -
annotate/popv
: now setsaprox
toFalse
to avoid usingannoy
in scanorama because it fails on processors that are missing the AVX-512 instruction sets, causingIllegal instruction (core dumped)
. -
workflows/full_pipeline
: Avoid adding sample names to observation ids twice (PR #457).
0.8.0
openpipelines 0.8.0
BREAKING CHANGES
-
workflows/full_pipeline
: Renamed inconsistencies in argument naming (#372):rna_min_vars_per_cell
was renamed torna_min_genes_per_cell
rna_max_vars_per_cell
was renamed torna_max_genes_per_cell
prot_min_vars_per_cell
was renamed toprot_min_proteins_per_cell
prot_max_vars_per_cell
was renamed toprot_max_proteins_per_cell
-
velocity/scvelo
: bump anndata from <0.8 to 0.9.
NEW FUNCTIONALITY
-
Added an extra label
veryhighmem
mostly forcellranger_multi
with a large number of samples. -
Added
multiomics/prot_multisample
pipeline. -
Added
clr
functionality toprot_multisample
pipeline. -
Added
interpret/lianapy
: Enables the use of any combination of ligand-receptor methods and resources, and their consensus. -
filter/filter_with_scrublet
: Add--allow_automatic_threshold_detection_fail
: when scrublet fails to detect doublets, the component will now putNA
in the output columns. -
workflows/full_pipeline
: Allow not setting the sample ID to the .obs column of the MuData object. -
workflows/rna_multisample
: Add the ID of the sample to the .obs column of the MuData object. -
correction/cellbender_remove_background
: addobsm_latent_gene_encoding
parameter to store the latent gene representation.
BUG FIXES
-
transform/clr
: fix anndata object instead of matrix being stored as a layer in outputMuData
, resulting inNoneTypeError
object after reading the.layers
back in. -
dataflow/concat
anddataflow/merge
: fixed a bug where boolean values were cast to their string representation. -
workflows/full_pipeline
: fix running pipeline with-stub
. -
Fixed an issue where passing a remote file URI (for example
http://
ors3://
) asparam_list
causedNo such file
errors. -
workflows/full_pipeline
: Fix incorrectly named filtering arguments (#372). -
correction/cellbender_remove_background
: addobsm_latent_gene_encoding
parameter to store the latent gene representation.
MINOR CHANGES
-
integrate/scarches
,integrate/scvi
andcorrection/cellbender_remove_background
: Update base container tonvcr.io/nvidia/pytorch:22.12-py3
-
integrate/scvi
: addgpu
label for nextflow platform. -
integrate/scvi
: use cuda enabledjax
install. -
convert/from_cellranger_multi_to_h5mu
,dataflow/concat
anddataflow/merge
: update pandas to 2.0.0 -
dataflow/concat
anddataflow/merge
: Boolean and integer columns are now represented by theBooleanArray
andIntegerArray
dtypes in order to allow storingNA
values. -
interpret/lianapy
: use the latest development release (commit 11156ddd0139a49dfebdd08ac230f0ebf008b7f8) of lianapy in order to fix compatibility with numpy 1.24.x. -
filter/filter_with_hvg
: Add error when specified input layer cannot be found in input data. -
workflows/multiomics/full_pipeline
: publish the output from sample merging to allow running different integrations.
0.7.1
openpipelines 0.7.1
NEW FUNCTIONALITY
-
integrate/scvi
: usenvcr.io/nvidia/pytorch:22.09-py3
as base container to enable GPU acceleration. -
integrate/scvi
: add--model_output
to save model. -
workflows/ingestion/cellranger_mapping
: Addedoutput_type
to output the filtered Cell Ranger data as h5mu, not the converted raw 10xh5 output. -
Several components: added
--output_compression
component to set the compression of output .h5mu files. -
workflows/full_pipeline
andworkflows/integration
: Addedleiden_resolution
argument to control the coarseness of the clustering. -
Added
--rna_theta
and--rna_harmony_theta
to full and integration pipeline respectively in order to tune the diversity clustering penalty parameter for harmony integration.
BUG FIXES
-
mapping/cellranger_multi
: Fix an issue where using a directory as value for--input
would causeAttributeError
. -
workflows/integration
:init_pos
is no longer set to the integration layer (e.g.X_pca_integrated
). -
dimred/pca
: fixvariance
slot containing a second copy of the variance ratio matrix and not the variances.
MINOR CHANGES
-
integration
andfull
workflows: do not run harmony integration whenobs_covariates
is not provided. -
Add
highmem
label todimred/pca
component. -
Remove disabled
convert/from_csv_to_h5mu
component. -
Update to Viash 0.7.1.
-
Several components: update to scanpy 1.9.2
-
process_10xh5/filter_10xh5
: speed up build by usingeddelbuettel/r2u:22.04
base container.
MAJOR CHANGES
dataflow/concat
: Renamed--compression
to--output_compression
.