Skip to content

Releases: openpipelines-bio/openpipeline

OpenPipelines.bio v0.12.2

18 Jan 11:34
bc76a3f
Compare
Choose a tag to compare

BUG FIXES

  • dataflow/concat and dataflow/concatenate_h5mu: Fix an issue where using --mode move on samples with non-overlapping features would cause var_names to become unaligned to the data (PR #653).

OpenPipelines.bio v0.11.1

18 Jan 11:35
202a63c
Compare
Choose a tag to compare

BUG FIXES

  • dataflow/concat: Fix an issue where using --mode move on samples with non-overlapping features would cause var_names to become unaligned to the data (PR #653).

OpenPipelines.bio v0.12.1

14 Nov 06:10
bb0a94b
Compare
Choose a tag to compare

BUG FIXES

  • rna_singlesample: Fix filtering parameters values min_counts, max_counts, min_genes_per_cell, max_genes_per_cell and min_cells_per_gene not being passed to the filter_with_counts component (PR #614).
  • prot_singlesample: Fix filtering parameters values min_counts, max_counts, min_proteins_per_cell, max_proteins_per_cell and min_cells_per_protein not being passed to the filter_with_counts component (PR #614).

OpenPipelines.bio v0.12.0

24 Oct 11:11
Compare
Choose a tag to compare

BREAKING CHANGES

The detection of mitochondrial genes has been revisited in order to remove the interdependency with the count filtering and the QC metric calculation.
Implementing this changes involved breaking some existing functionality:

  • filter/filter_with_counts: removed --var_gene_names, --mitochondrial_gene_regex, --var_name_mitochondrial_genes, --min_fraction_mito and --max_fraction_mito (PR #585).

  • workflows/prot_singlesample: removed --min_fraction_mito and --max_fraction_mito because regex-based detection detection of mitochondrial genes is not possible (PR #585).

  • The fraction of counts that originated from mitochondrial genes used to be written to an .obs column with a name that was derived from pct_ suffixed by the name of the mitochondrial gene column. The --obs_name_mitochondrial_fraction argument is introduced to change the destination column and the default prefix has changed from pct_ to fraction_ (PR #585).

NEW FUNCTIONALITY

  • workflows/qc: A pipeline to add basic qc statistics to a MuData object (PR #585).

  • workflows/rna_singlesample: added --obs_name_mitochondrial_fraction and make sure that the values from --max_fraction_mito and --min_fraction_mito are bound between 0 and 1 (PR #585).

  • Added filter/delimit_fraction: Turns an annotation column containing values between 0 and 1 into a boolean column based on thresholds (PR #585).

  • Added metadata/grep_annotation_column: Perform a regex lookup on a column from the annotation matrices .obs or .var (PR #585).

  • workflows/full_pipelines: added --obs_name_mitochondrial_fraction argument (PR #585).

  • workflows/prot_multisample: added --var_qc_metrics and --top_n_vars arguments (PR #585).

MINOR CHANGES

  • Several components: bump scanpy to 1.9.5 (PR #594).

  • Refactored prot_multisample and prot_singlesample pipelines to use fromState and toState functionality (PR #585).

OpenPipelines.bio v0.11.0

10 Oct 11:32
Compare
Choose a tag to compare

BREAKING CHANGES

  • Nextflow VDSL3: set simplifyOutput to False by default. This implies that components and workflows will output a hashmap with a sole "output" entry when there is only one output (PR #563).

  • integrate/scvi: rename model_output argument to output_model in order to align with the scvi_leiden workflow. This also fixes a bug with the workflow where the argument did not function (PR #562).

MINOR CHANGES

  • dataflow/concat: reduce memory consumption when using --other_axis_mode move by processing only one annotation matrix (.var, .obs) at a time (PR #569).

  • Update viashpy and pin it to 0.5.0 (PR #572 and PR #577).

  • convert/from_h5ad_to_h5mu, convert/from_h5mu_to_h5ad, dimred/pca, dimred/umap/,
    filter/filter_with_counts, filter/filter_with_hvg, filter/remove_modality, filter/subset_h5mu,
    integrate/scanorama, transform/delete_layer and transform/log1p: update python to 3.9 (PR #572).

  • integrate/scarches: update base image, scvi-tools and pandas to nvcr.io/nvidia/pytorch:23.09-py3, ~=1.0.3 and ~=2.1.0 respectively (PR #572).

  • integrate/totalvi: update python to 3.9 and scvi-tools to ~=1.0.3 (PR #572).

  • correction/cellbender_remove_background: change base image to nvcr.io/nvidia/cuda:11.8.0-devel-ubuntu22.04 and downwgrade MuData to 0.2.1 because it is the oldest version that uses python 3.7 (PR #575).

  • Several integration workflows: prevent leiden from being executed when no resolutions are provided (PR #583).

  • dataflow/concat: bump pandas to ~=2.1.1 and reduce memory consumption by only reading one modality into memory at a time (PR #568).

  • annotate/popv: bump jax and jaxlib to 0.4.10, scanpy to 1.9.4, scvi to 1.0.3 and pin ml-dtypes to < 0.3.0 (PR #565).

  • velocity/scvelo: pin matplotlib to < 3.8.0 (PR #566).

  • mapping/multi_star: pin multiqc to 1.15.0 (PR #566).

  • mapping/bd_rhapsody: pin pandas version to <2 (PR #563).

  • query/cellxgene_census: replaced label singlecpu with label midcpu.

  • query/cellxgene_census: avoid creating MuData object in memory by writing the modality directly to disk (PR #558).

  • integrate/scvi: use midcpu label instead of singlecpu (PR #561).

BUG FIXES

  • transform/clr: raise an error when CLR fails to return the requested output (PR #579).

  • correction/cellbender_remove_background: fix missing helper functionality when using Fusion (PR #575).

  • convert/from_bdrhap_to_h5mu: Avoid TypeError: Can't implicitly convert non-string objects to strings by using categorical dtypes when a string column contains NA values (PR #563).

  • qc/calculate_qc_metrics: fix calculating mitochondrial gene related QC metrics when only or no mitochondrial genes were found (PR #564).

OpenPipelines.bio v0.10.0

20 Sep 08:03
616a8a8
Compare
Choose a tag to compare

BREAKING CHANGES

  • workflows/full_pipeline: removed --prot_min_fraction_mito and --prot_max_fraction_mito (PR #451)

  • workflows/rna_multisample and workflows/prot_multisample: Removed concatenation from these pipelines. The input for these pipelines is now a single mudata file that contains data for multiple samples. If you wish to use this pipeline on multiple single-sample mudata files, you can use the dataflow/concat components on them first. This also implies that the ability to add ids to multiple single-sample mudata files prior to concatenation is no longer required, hence the removal of --add_id_to_obs, --sample_id, --add_id_obs_output, and --add_id_make_observation_keys_unique (PR #475).

  • The scvi pipeline was renamed to scvi_leiden because leiden clustering was added to the pipeline (PR #499).

  • Upgrade correction/cellbender_remove_background from CellBender v0.2 to CellBender v0.3.0 (PR #523).
    Between these versions, several arguments related to the slots of the output file have been changed.

MAJOR CHANGES

  • Several components: update anndata to 0.9.3 and mudata to 0.2.3 (PR #423).

  • Base resources assigned for a process without any labels is now 1 CPU and 2GB (PR #518).

  • Updated to Viash 0.7.5 (PR #513).

  • Removed deprecated variant: vdsl3 tags (PR #513).

  • Removed unused version: dev (PR #513).

  • multiomics/integration/harmony_leiden: Refactored data flow (PR #513).

  • ingestion/bd_rhapsody: Refactored data flow (PR #513).

  • query/cellxgene_census: increased returned metadata content, revised query option, added filtering strategy and refactored functionality (PR #520).

  • Refactor loggers using setup_logger() helper function (PR #534).

  • Refactor unittest tests to pytest tests (PR #534).

MINOR CHANGES

  • Add resource labels to several components (PR #518).

  • full_pipeline: default value for --var_qc_metrics is now the combined values specified for --mitochondrial_gene_regex and --filter_with_hvg_var_output.

  • dataflow/concat: reduce memory consumption by only reading one modality at the same time (PR #474).

  • Components that use CellRanger, BCL Convert or bcl2fastq: updated from Ubuntu 20.04 to Ubuntu 22.04 (PR #494).

  • Components that use CellRanger: updated Picard to 2.27.5 (PR #494).

  • interprete/liana: Update lianapy to 0.1.9 (PR #497).

  • qc/multiqc: add unittests (PR #502).

  • reference/build_cellranger_reference: add unit tests (PR #506).

  • reference/build_bd_rhapsody_reference: add unittests (PR #504).

NEW FUNCTIONALITY

  • Added compression/compress_h5mu component (PR #530).

  • Resource management: when a process exits with a status code between 137 and 140, retry the process with increased memory requirements. Memory scales by multiplying the base memory assigned to the process with the attempt number (PR #518 and PR #527).

  • integrate/scvi: Add --n_hidden_nodes, --n_dimensions_latent_space, --n_hidden_layers, --dropout_rate, --dispersion, --gene_likelihood, --use_layer_normalization, --use_batch_normalization, --encode_covariates, --deeply_inject_covariates and --use_observed_lib_size parameters.

  • filter/filter_with_counts: add --var_name_mitochondrial_genes argument to store a boolean array corresponding the detected mitochondrial genes.

  • full_pipeline and rna_singlesample pipelines: add --var_name_mitochondrial_genes, --var_gene_names and --mitochondrial_gene_regex arguments to specify mitochondrial gene detection behaviour.

  • integrate/scvi: Add --obs_labels, --obs_size_factor, --obs_categorical_covariate and --obs_continuous_covariate arguments (PR #496).

  • Added var_qc_metrics_fill_na_value argument to calculate_qc_metrics (PR #477).

  • Added multiomics/multisample pipeline to run multisample processing followed by the integration setup. It is considered an entrypoint into the full pipeline which skips the single-sample processing. The idea is to allow a a re-run of these steps after a sample has already been processed by the full_pipeline. Keep in mind that samples that are provided as input to this pipeline are processed separately and are not concatenated. Hence, the input should be a concatenated sample (PR #475).

  • Added multiomics/integration/bbknn_leiden workflow. (PR #456).

  • workflows/prot_multisample and workflows/full_pipelines: add basic QC statistics to prot modality (PR #485).

  • mapping/cellranger_multi: Add tests for the mapping of Crispr Guide Capture data (PR #494).

  • convert/from_cellranger_multi_to_h5mu: add perturbation_efficiencies_by_feature and perturbation_efficiencies_by_feature information to .uns slot of gdo modality (PR #494).

  • convert/from_cellranger_multi_to_h5mu: add feature_reference information to the MuData object. Information is split between the modalities. For example CRISPR Guide Capture information if added to the .uns slot of the gdo modality, while Antibody Capture information is added to the .uns slot of prot (PR #494).

  • Added multiomics/integration/totalvi_leiden pipeline (PR #500).

  • Added totalVI component (PR #386).

  • workflows/full_pipeline: Add pca_overwrite argument (PR #511).

  • Add main_build_viash_hub action to build, tag, and push components and docker images for viash-hub.com (PR #480).

  • integration/bbknn_leiden: Update state management to fromState / toState (PR #538).

DOCUMENTATION

  • images: Added images for various concepts, such as a sample, a cell, RNA, ADT, ATAC, VDJ (PR #515).

  • multiomics/rna_singlesample: Add image for workflow (PR #515).

  • multiomics/rna_multisample: Add image for workflow (PR #515).

  • multiomics/prot_singlesample: Add image for workflow (PR #515).

  • multiomics/prot_multisample: Add image for workflow (PR #515).

BUG FIXES

  • Fix an issue with workflows/multiomics/scanorama_leiden where the --output argument doesn't work as expected (PR #509).

  • Fix an issue with workflows/full_pipeline not correctly caching previous runs (PR #460).

  • Fix incorrect namespaces of the integration pipelines (PR #464).

  • Fix an issue in several workflows where the --output argument would not work (PR #476).

  • integration/harmony_leiden and integration/scanorama_leiden: Fix an issue where the prefix of the columns that store the leiden clusters was hardcoded to leiden, instead of adapting to the value for --obs_cluster (PR #482).

  • velocity/velocyto: Resolve symbolic link before checking whether the transcriptome is a gzip (PR #484).

  • workflows/integration/scanorama_leiden: fix an issue where --obsm_input, --obs_batch, --batch_size, --sigma, --approx, --alphaand-knn` were not working beacuse they were not passed through to the scanorama component (PR #487).

  • workflows/integration/scanorama_leiden: fix leiden being calculated on the wrong embedding because the --obsm_input argument was not correctly set to the output embedding of scanorama (PR #487).

  • mapping/cellranger_multi: Fix and issue where modalities did not have the proper name (PR #494).

  • metadata/add_uns_to_obs: Fix KeyError: 'ouput_compression' error (PR #501).

  • neighbors/bbknn: Fix --input not being a required argument (PR #518).

  • Create correction/cellbender_remove_background_v0.2 for legacy CellBender v0.2 format (PR #523).

  • integrate/scvi: Ensure output has the same dimensionality as the input (PR #524).

  • mapping/bd_rhapsody: Fix --dryrun argument not working (PR #534).

  • qc/multiqc: Fix component not working for multiple inputs (PR #537). Also converted Bash script to Python scripts.

  • neighbors/bbknn: Fix --uns_output, --obsp_distances and --obsp_connectivities not being processed correctly (PR #538).

OpenPipelines.bio v0.10.1

08 Sep 16:25
Compare
Choose a tag to compare

MINOR CHANGES

  • integration/scvi_leiden: Expose hvg selection argument --var_input (#543, PR #547).

BUG FIXES

  • integration/bbknn_leiden: Set leiden clustering parameter to multiple (#542, PR #545).

  • integration/scvi_leiden: Fix component name in Viash config (PR #547).

  • integration/harmony_leiden: Pass --uns_neighbors argument umap (PR #548).

  • Add workaround for bug where resources aren't available when using Nextflow fusion by including setup_logger, subset_vars and compress_h5mu in the script itself (PR #549).

0.9.0

20 Jun 06:24
Compare
Choose a tag to compare

Openpipelines 0.9.0

BREAKING CHANGES

Running the integration in the full_pipeline deemed to be impractical because a plethora of integration methods exist, which in turn interact with other functionality (like clustering). This generates a large number of possible usecases which one pipeline cannot cover in an easy manner. Instead, each integration methods will be split into its separate pipeline, and the full_pipeline will prepare for integration by performing steps that are required by many integration methods. Therefore, the following changes were performed:

  • workflows/full_pipeline: harmony integration and leiden clustering are removed from the pipeline.

  • Added initialize_integration to run calculations that output information commonly required by the integration methods. This pipeline runs PCA, nearest neighbours and UMAP. This pipeline is run as a subpipeline at the end of full_pipeline.

  • Added leiden_harmony integration pipeline: run harmony integration followed by neighbour calculations and leiden clustering. Also runs umap on the result.

  • Removed the integration pipeline.

The old behavior of the full_pipeline can be obtained by running full_pipeline followed by the leiden_harmony pipeline.

  • The crispr and hashing modalities have been renamed to gdo and hto respectively (PR #392).

  • Updated Viash to 0.7.4 (PR #390).

  • cluster/leiden: Output is now stored into .obsm instead of .obs (PR #431).

NEW FUNCTIONALITY

  • cluster/leiden and integration/harmony_leiden: allow running leiden multiple times with multiple resolutions (PR #431).

  • workflows/full_pipeline: PCA, nearest neighbours and UMAP are now calculated for the prot modality (PR #396).

  • transform/clr: added output_layer argument (PR #396).

  • workflows/integration/scvi: Run scvi integration followed by neighbour calculations and run umap on the result (PR #396).

  • mapping/cellranger_multi and workflows/ingestion/cellranger_multi: Added --vdj_inner_enrichment_primers argument (PR #417).

  • metadata/move_obsm_to_obs: Move a matrix from an .obsm slot into .obs (PR #431).

  • integrate/scvi validity checks for non-normalized input, obs and vars in order to proceed to training (PR #429).

  • schemas: Added schema files for authors (PR #436).

  • schemas: Added schema file for Viash configs (PR #436).

  • schemas: Refactor author import paths (PR #436).

  • schemas: Added schema file for file format specification files (PR #437).

  • query/cellxgene_census: Query Cellxgene census component and save the results to a MuData file. (PR #433).

MAJOR CHANGES

  • report/mermaid: Now used mermaid-cli to generate images instead of creating a request to mermaid.ink. New --output_format, --width, --height and --background_color arguments were added (PR #419).

  • All components that used python as base container: use slim version to reduce container image size (PR #427).

MINOR CHANGES

  • integrate/scvi: update scvi to 1.0.0 (PR #448)

  • mapping/multi_star: Added --min_success_rate which causes component to fail when the success rate of processed samples were successful (PR #408).

  • correction/cellbender_remove_background and transform/clr: update muon to 0.1.5 (PR #428)

  • ingestion/cellranger_postprocessing: split integration tests into several workflows (PR #425).

  • schemas: Add schema file for author yamls (PR #436).

  • mapping/multi_star, mapping/star_build_reference and mapping/star_align: update STAR from 2.7.10a to 2.7.10b (PR #441).

BUG FIXES

  • annotate/popv: Fix concat issue when the input data has multiple layers (#395, PR #397).

  • annotate/popv: Fix indexing issue when MuData object contain non overlapping modalities (PR #405).

  • mapping/multi_star: Fix issue where temp dir could not be created when group_id contains slashes (PR #406).

  • mapping/multi_star_to_h5mu: Use glob to look for count files recursively (PR #408).

  • annotate/popv: Pin PopV, jax and jaxlib versions (PR #415).

  • integrate/scvi: the max_epochs is no longer required since it has a default value (PR #396).

  • workflows/full_pipeline: fix make_observation_keys_unique parameter not being correctly passed to the add_id component, causing ValueError: Observations are not unique across samples during execution of the concat component (PR #422).

  • annotate/popv: now sets aprox to False to avoid using annoy in scanorama because it fails on processors that are missing the AVX-512 instruction sets, causing Illegal instruction (core dumped).

  • workflows/full_pipeline: Avoid adding sample names to observation ids twice (PR #457).

0.8.0

15 May 06:20
Compare
Choose a tag to compare

openpipelines 0.8.0

BREAKING CHANGES

  • workflows/full_pipeline: Renamed inconsistencies in argument naming (#372):

    • rna_min_vars_per_cell was renamed to rna_min_genes_per_cell
    • rna_max_vars_per_cell was renamed to rna_max_genes_per_cell
    • prot_min_vars_per_cell was renamed to prot_min_proteins_per_cell
    • prot_max_vars_per_cell was renamed to prot_max_proteins_per_cell
  • velocity/scvelo: bump anndata from <0.8 to 0.9.

NEW FUNCTIONALITY

  • Added an extra label veryhighmem mostly for cellranger_multi with a large number of samples.

  • Added multiomics/prot_multisample pipeline.

  • Added clr functionality to prot_multisample pipeline.

  • Added interpret/lianapy: Enables the use of any combination of ligand-receptor methods and resources, and their consensus.

  • filter/filter_with_scrublet: Add --allow_automatic_threshold_detection_fail: when scrublet fails to detect doublets, the component will now put NA in the output columns.

  • workflows/full_pipeline: Allow not setting the sample ID to the .obs column of the MuData object.

  • workflows/rna_multisample: Add the ID of the sample to the .obs column of the MuData object.

  • correction/cellbender_remove_background: add obsm_latent_gene_encoding parameter to store the latent gene representation.

BUG FIXES

  • transform/clr: fix anndata object instead of matrix being stored as a layer in output MuData, resulting in NoneTypeError object after reading the .layers back in.

  • dataflow/concat and dataflow/merge: fixed a bug where boolean values were cast to their string representation.

  • workflows/full_pipeline: fix running pipeline with -stub.

  • Fixed an issue where passing a remote file URI (for example http:// or s3://) as param_list caused No such file errors.

  • workflows/full_pipeline: Fix incorrectly named filtering arguments (#372).

  • correction/cellbender_remove_background: add obsm_latent_gene_encoding parameter to store the latent gene representation.

MINOR CHANGES

  • integrate/scarches, integrate/scvi and correction/cellbender_remove_background: Update base container to nvcr.io/nvidia/pytorch:22.12-py3

  • integrate/scvi: add gpu label for nextflow platform.

  • integrate/scvi: use cuda enabled jax install.

  • convert/from_cellranger_multi_to_h5mu, dataflow/concat and dataflow/merge: update pandas to 2.0.0

  • dataflow/concat and dataflow/merge: Boolean and integer columns are now represented by the BooleanArray and IntegerArray dtypes in order to allow storing NA values.

  • interpret/lianapy: use the latest development release (commit 11156ddd0139a49dfebdd08ac230f0ebf008b7f8) of lianapy in order to fix compatibility with numpy 1.24.x.

  • filter/filter_with_hvg: Add error when specified input layer cannot be found in input data.

  • workflows/multiomics/full_pipeline: publish the output from sample merging to allow running different integrations.

0.7.1

11 Mar 14:56
Compare
Choose a tag to compare

openpipelines 0.7.1

NEW FUNCTIONALITY

  • integrate/scvi: use nvcr.io/nvidia/pytorch:22.09-py3 as base container to enable GPU acceleration.

  • integrate/scvi: add --model_output to save model.

  • workflows/ingestion/cellranger_mapping: Added output_type to output the filtered Cell Ranger data as h5mu, not the converted raw 10xh5 output.

  • Several components: added --output_compression component to set the compression of output .h5mu files.

  • workflows/full_pipeline and workflows/integration: Added leiden_resolution argument to control the coarseness of the clustering.

  • Added --rna_theta and --rna_harmony_theta to full and integration pipeline respectively in order to tune the diversity clustering penalty parameter for harmony integration.

BUG FIXES

  • mapping/cellranger_multi: Fix an issue where using a directory as value for --input would cause AttributeError.

  • workflows/integration: init_pos is no longer set to the integration layer (e.g. X_pca_integrated).

  • dimred/pca: fix variance slot containing a second copy of the variance ratio matrix and not the variances.

MINOR CHANGES

  • integration and full workflows: do not run harmony integration when obs_covariates is not provided.

  • Add highmem label to dimred/pca component.

  • Remove disabled convert/from_csv_to_h5mu component.

  • Update to Viash 0.7.1.

  • Several components: update to scanpy 1.9.2

  • process_10xh5/filter_10xh5: speed up build by using eddelbuettel/r2u:22.04 base container.

MAJOR CHANGES

  • dataflow/concat: Renamed --compression to --output_compression.