diff --git a/CHANGELOG.md b/CHANGELOG.md index bb12c68..6c817fb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,10 +6,18 @@ * Directory structure has been updated. +* Update to viash 0.9.0 (PR #13). + ## NEW FUNCTIONALITY * Add `CHANGELOG.md` (PR #7). +## MAJOR CHANGES + +* Revamp `scripts` directory (PR #13). + +* Relocated `process_datasets` to `data_processors/process_datasets` (PR #13). + ## MINOR CHANGES * Remove dtype parameter in `.Anndata()` (PR #6). @@ -20,6 +28,11 @@ * Update docker containers used in components (PR #12). +* Set `numpy<2` for some failing methods (PR #13). + +* Small changes to api file names (PR #13). + + ## transfer from openproblems-v2 repository ### NEW FUNCTIONALITY diff --git a/README.md b/README.md index cafd684..c5def4a 100644 --- a/README.md +++ b/README.md @@ -8,75 +8,8 @@ Do not edit this file directly. Removing noise in sparse single-cell RNA-sequencing count data -Path to source: -[`src`](https://github.com/openproblems-bio/task_denoising/src) - -## README - -## Installation - -You need to have Docker, Java, and Viash installed. Follow [these -instructions](https://openproblems.bio/documentation/fundamentals/requirements) -to install the required dependencies. - -## Add a method - -To add a method to the repository, follow the instructions in the -`scripts/add_a_method.sh` script. - -## Frequently used commands - -To get started, you can run the following commands: - -``` bash -git clone git@github.com:openproblems-bio/task_denoising.git - -cd task_denoising - -# initialise submodule -scripts/init_submodule.sh - -# download resources -scripts/download_resources.sh -``` - -To run the benchmark, you first need to build the components. -Afterwards, you can run the benchmark: - -``` bash -viash ns build --parallel --setup cachedbuild - -scripts/run_benchmark.sh -``` - -After adding a component, it is recommended to run the tests to ensure -that the component is working correctly: - -``` bash -viash ns test --parallel -``` - -Optionally, you can provide the `--query` argument to test only a subset -of components: - -``` bash -viash ns test --parallel --query 'component_name' -``` - -## Motivation - -Single-cell RNA-Seq protocols only detect a fraction of the mRNA -molecules present in each cell. As a result, the measurements (UMI -counts) observed for each gene and each cell are associated with -generally high levels of technical noise ([Grün et al., -2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes -the task of estimating the true expression level of each gene in each -cell. In the single-cell literature, this task is also referred to as -*imputation*, a term which is typically used for missing data problems -in statistics. Similar to the use of the terms “dropout”, “missing -data”, and “technical zeros”, this terminology can create confusion -about the underlying measurement process ([Sarkar and Stephens, -2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)). +Repository: +[openproblems-bio/task_denoising](https://github.com/openproblems-bio/task_denoising) ## Description @@ -114,24 +47,24 @@ dataset. ``` mermaid flowchart LR file_common_dataset("Common Dataset") - comp_process_dataset[/"Data processor"/] - file_train_h5ad("Training data") - file_test_h5ad("Test data") + comp_data_processor[/"Data processor"/] + file_test("Test data") + file_train("Training data") comp_control_method[/"Control Method"/] - comp_method[/"Method"/] comp_metric[/"Metric"/] + comp_method[/"Method"/] file_prediction("Denoised data") file_score("Score") - file_common_dataset---comp_process_dataset - comp_process_dataset-->file_train_h5ad - comp_process_dataset-->file_test_h5ad - file_train_h5ad---comp_control_method - file_train_h5ad---comp_method - file_test_h5ad---comp_control_method - file_test_h5ad---comp_metric + file_common_dataset---comp_data_processor + comp_data_processor-->file_test + comp_data_processor-->file_train + file_test---comp_control_method + file_test---comp_metric + file_train---comp_control_method + file_train---comp_method comp_control_method-->file_prediction - comp_method-->file_prediction comp_metric-->file_score + comp_method-->file_prediction file_prediction---comp_metric ``` @@ -151,7 +84,7 @@ Format: -Slot description: +Data structure:
@@ -170,9 +103,6 @@ Slot description: ## Component type: Data processor -Path: -[`src/process_dataset`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/process_dataset) - A denoising dataset processor. Arguments: @@ -187,11 +117,11 @@ Arguments:
-## File format: Training data +## File format: Test data -The subset of molecules used for the training dataset +The subset of molecules used for the test dataset -Example file: `resources_test/denoising/pancreas/train.h5ad` +Example file: `resources_test/denoising/pancreas/test.h5ad` Format: @@ -199,26 +129,33 @@ Format: AnnData object layers: 'counts' - uns: 'dataset_id' + uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'train_sum' -Slot description: +Data structure:
-| Slot | Type | Description | -|:--------------------|:----------|:-------------------------------------| -| `layers["counts"]` | `integer` | Raw counts. | -| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| Slot | Type | Description | +|:---|:---|:---| +| `layers["counts"]` | `integer` | Raw counts. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["dataset_name"]` | `string` | Nicely formatted name. | +| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. | +| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. | +| `uns["dataset_summary"]` | `string` | Short description of the dataset. | +| `uns["dataset_description"]` | `string` | Long description of the dataset. | +| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. | +| `uns["train_sum"]` | `integer` | The total number of counts in the training dataset. |
-## File format: Test data +## File format: Training data -The subset of molecules used for the test dataset +The subset of molecules used for the training dataset -Example file: `resources_test/denoising/pancreas/test.h5ad` +Example file: `resources_test/denoising/pancreas/train.h5ad` Format: @@ -226,33 +163,23 @@ Format: AnnData object layers: 'counts' - uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'train_sum' + uns: 'dataset_id' -Slot description: +Data structure:
-| Slot | Type | Description | -|:---|:---|:---| -| `layers["counts"]` | `integer` | Raw counts. | -| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | -| `uns["dataset_name"]` | `string` | Nicely formatted name. | -| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. | -| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. | -| `uns["dataset_summary"]` | `string` | Short description of the dataset. | -| `uns["dataset_description"]` | `string` | Long description of the dataset. | -| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. | -| `uns["train_sum"]` | `integer` | The total number of counts in the training dataset. | +| Slot | Type | Description | +|:--------------------|:----------|:-------------------------------------| +| `layers["counts"]` | `integer` | Raw counts. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
## Component type: Control Method -Path: -[`src/control_methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/control_methods) - A control method. Arguments: @@ -267,12 +194,9 @@ Arguments: -## Component type: Method - -Path: -[`src/methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/methods) +## Component type: Metric -A method. +A metric. Arguments: @@ -280,17 +204,15 @@ Arguments: | Name | Type | Description | |:---|:---|:---| -| `--input_train` | `file` | The subset of molecules used for the training dataset. | -| `--output` | `file` | (*Output*) A denoised dataset as output by a method. | +| `--input_test` | `file` | The subset of molecules used for the test dataset. | +| `--input_prediction` | `file` | A denoised dataset as output by a method. | +| `--output` | `file` | (*Output*) File indicating the score of a metric. | -## Component type: Metric - -Path: -[`src/metrics`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/metrics) +## Component type: Method -A metric. +A method. Arguments: @@ -298,9 +220,8 @@ Arguments: | Name | Type | Description | |:---|:---|:---| -| `--input_test` | `file` | The subset of molecules used for the test dataset. | -| `--input_prediction` | `file` | A denoised dataset as output by a method. | -| `--output` | `file` | (*Output*) File indicating the score of a metric. | +| `--input_train` | `file` | The subset of molecules used for the training dataset. | +| `--output` | `file` | (*Output*) A denoised dataset as output by a method. | @@ -320,7 +241,7 @@ Format: -Slot description: +Data structure:
@@ -347,7 +268,7 @@ Format:
-Slot description: +Data structure:
diff --git a/_viash.yaml b/_viash.yaml index ebbacf3..218e5a1 100644 --- a/_viash.yaml +++ b/_viash.yaml @@ -1,32 +1,12 @@ name: task_denoising -version: dev - organization: openproblems-bio -description: | - Removing noise in sparse single-cell RNA-sequencing count data. +version: dev license: MIT -keywords: [single-cell, openproblems, benchmark, denoising] -links: - issue_tracker: https://github.com/openproblems-bio/task_denoising/issues - repository: https://github.com/openproblems-bio/task_denoising - docker_registry: ghcr.io -info: - label: Denoising - summary: "Removing noise in sparse single-cell RNA-sequencing count data" - image: /src/api/thumbnail.svg - motivation: | - Single-cell RNA-Seq protocols only detect a fraction of the mRNA molecules present - in each cell. As a result, the measurements (UMI counts) observed for each gene and each - cell are associated with generally high levels of technical noise ([Grün et al., - 2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes the task of - estimating the true expression level of each gene in each cell. In the single-cell - literature, this task is also referred to as *imputation*, a term which is typically - used for missing data problems in statistics. Similar to the use of the terms "dropout", - "missing data", and "technical zeros", this terminology can create confusion about the - underlying measurement process ([Sarkar and Stephens, - 2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)). - description: | +label: Denoising +keywords: [single-cell, openproblems, benchmark, denoising] +summary: "Removing noise in sparse single-cell RNA-sequencing count data" +description: | A key challenge in evaluating denoising methods is the general lack of a ground truth. A recent benchmark study ([Hou et al., 2020](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02132-x)) @@ -43,6 +23,25 @@ info: accuracy is measured by comparing the result to the test dataset. The authors show that both in theory and in practice, the measured denoising accuracy is representative of the accuracy that would be obtained on a ground truth dataset. +links: + issue_tracker: https://github.com/openproblems-bio/task_denoising/issues + repository: https://github.com/openproblems-bio/task_denoising + docker_registry: ghcr.io + +info: + image: thumbnail.svg + motivation: | + Single-cell RNA-Seq protocols only detect a fraction of the mRNA molecules present + in each cell. As a result, the measurements (UMI counts) observed for each gene and each + cell are associated with generally high levels of technical noise ([Grün et al., + 2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes the task of + estimating the true expression level of each gene in each cell. In the single-cell + literature, this task is also referred to as *imputation*, a term which is typically + used for missing data problems in statistics. Similar to the use of the terms "dropout", + "missing data", and "technical zeros", this terminology can create confusion about the + underlying measurement process ([Sarkar and Stephens, + 2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)). + test_resources: - type: s3 path: s3://openproblems-data/resources_test/denoising/ @@ -50,6 +49,7 @@ info: - type: s3 path: s3://openproblems-data/resources_test/common/ dest: resources_test/common + authors: - name: "Wesley Lewis" roles: [ author, maintainer ] diff --git a/common b/common index 25bfdf2..79006d5 160000 --- a/common +++ b/common @@ -1 +1 @@ -Subproject commit 25bfdf258bf7ff4417964eb5647b2dc999f12cfd +Subproject commit 79006d5f737a0697dafc98935b1256d3a4682853 diff --git a/scripts/.gitignore b/scripts/.gitignore deleted file mode 100644 index 2f7ffd3..0000000 --- a/scripts/.gitignore +++ /dev/null @@ -1,3 +0,0 @@ -add_a_method.sh -add_a_control_method.sh -add_a_metric.sh \ No newline at end of file diff --git a/scripts/create_component/.gitignore b/scripts/create_component/.gitignore new file mode 100644 index 0000000..09380f9 --- /dev/null +++ b/scripts/create_component/.gitignore @@ -0,0 +1,2 @@ +# if users change the scripts, the changes should not be committed. +/create_*_*.sh \ No newline at end of file diff --git a/scripts/create_readme.sh b/scripts/create_readme.sh index e5dec6f..b43731f 100755 --- a/scripts/create_readme.sh +++ b/scripts/create_readme.sh @@ -1,5 +1,5 @@ #!/bin/bash -common/create_task_readme/create_task_readme \ - --task_dir src \ - --output README.md +set -e + +common/scripts/create_task_readme --input src/api \ No newline at end of file diff --git a/scripts/process_datasets.sh b/scripts/create_resources/resources.sh similarity index 60% rename from scripts/process_datasets.sh rename to scripts/create_resources/resources.sh index 85c0559..a289f00 100755 --- a/scripts/process_datasets.sh +++ b/scripts/create_resources/resources.sh @@ -1,7 +1,12 @@ #!/bin/bash +# get the root of the directory +REPO_ROOT=$(git rev-parse --show-toplevel) + +# ensure that the command below is run from the root of the repository +cd "$REPO_ROOT" + cat > /tmp/params.yaml << 'HERE' -id: denoising_process_datasets input_states: s3://openproblems-data/resources/datasets/**/log_cp10k/state.yaml rename_keys: 'input:output_dataset' settings: '{"output_train": "$id/train.h5ad", "output_test": "$id/test.h5ad"}' @@ -9,20 +14,7 @@ output_state: "$id/state.yaml" publish_dir: s3://openproblems-data/resources/denoising/datasets HERE -cat > /tmp/nextflow.config << HERE -process { - executor = 'awsbatch' - withName:'.*publishStatesProc' { - memory = '16GB' - disk = '100GB' - } - withLabel:highmem { - memory = '350GB' - } -} -HERE - -tw launch https://github.com/openproblems-bio/task_denoising.git \ +tw launch https://github.com/openproblems-bio/task_template.git \ --revision build/main \ --pull-latest \ --main-script target/nextflow/workflows/process_datasets/main.nf \ @@ -30,5 +22,5 @@ tw launch https://github.com/openproblems-bio/task_denoising.git \ --compute-env 6TeIFgV5OY4pJCk8I0bfOh \ --params-file /tmp/params.yaml \ --entry-name auto \ - --config /tmp/nextflow.config \ - --labels denoising,process_datasets \ No newline at end of file + --config common/nextflow_helpers/labels_tw.config \ + --labels denoising,process_datasets diff --git a/scripts/create_resources/test_resources.sh b/scripts/create_resources/test_resources.sh new file mode 100755 index 0000000..980d179 --- /dev/null +++ b/scripts/create_resources/test_resources.sh @@ -0,0 +1,52 @@ +#!/bin/bash + +# get the root of the directory +REPO_ROOT=$(git rev-parse --show-toplevel) + +# ensure that the command below is run from the root of the repository +cd "$REPO_ROOT" + +# # remove this when you have implemented the script +# echo "TODO: replace the commands in this script with the sequence of components that you need to run to generate test_resources." +# echo " Inside this script, you will need to place commands to generate example files for each of the 'src/api/file_*.yaml' files." +# exit 1 + +set -e + +RAW_DATA=resources_test/common +DATASET_DIR=resources_test/denoising + +mkdir -p $DATASET_DIR + +# process dataset +viash run src/data_processors/process_dataset/config.vsh.yaml -- \ + --input $RAW_DATA/cxg_mouse_pancreas_atlas/dataset.h5ad \ + --output_train $DATASET_DIR/cxg_mouse_pancreas_atlas/train.h5ad \ + --output_test $DATASET_DIR/cxg_mouse_pancreas_atlas/test.h5ad \ + --output_solution $DATASET_DIR/cxg_mouse_pancreas_atlas/solution.h5ad + +# run one method +viash run src/methods/magic/config.vsh.yaml -- \ + --input_train $DATASET_DIR/pancreas/train.h5ad \ + --output $DATASET_DIR/pancreas/denoised.h5ad + +# run one metric +viash run src/metrics/poisson/config.vsh.yaml -- \ + --input_denoised $DATASET_DIR/pancreas/denoised.h5ad \ + --input_test $DATASET_DIR/pancreas/test.h5ad \ + --output $DATASET_DIR/pancreas/score.h5ad + +# write manual state.yaml. this is not actually necessary but you never know it might be useful +cat > $DATASET_DIR/cxg_mouse_pancreas_atlas/state.yaml << HERE +id: cxg_mouse_pancreas_atlas +train: !file train.h5ad +test: !file test.h5ad +solution: !file solution.h5ad +prediction: !file denoised.h5ad +score: !file score.h5ad +HERE + +# only run this if you have access to the openproblems-data bucket +# aws s3 sync --profile op \ +# "$DATASET_DIR" s3://openproblems-data/resources_test/denoising \ +# --delete --dryrun diff --git a/scripts/create_test_resources.sh b/scripts/create_test_resources.sh deleted file mode 100755 index deec9dc..0000000 --- a/scripts/create_test_resources.sh +++ /dev/null @@ -1,38 +0,0 @@ -#!/bin/bash - -# get the root of the directory -REPO_ROOT=$(git rev-parse --show-toplevel) - -# ensure that the command below is run from the root of the repository -cd "$REPO_ROOT" - -set -e - -RAW_DATA=resources_test/common -DATASET_DIR=resources_test/denoising - -mkdir -p $DATASET_DIR - -# process dataset -echo Running process_dataset -nextflow run . \ - -main-script target/nextflow/workflows/process_datasets/main.nf \ - -profile docker \ - -entry auto \ - --input_states "$RAW_DATA/**/state.yaml" \ - --rename_keys 'input:output_dataset' \ - --settings '{"output_train": "$id/train.h5ad", "output_test": "$id/test.h5ad"}' \ - --publish_dir "$DATASET_DIR" \ - --output_state '$id/state.yaml' - -# run one method -viash run src/methods/magic/config.vsh.yaml -- \ - --input_train $DATASET_DIR/pancreas/train.h5ad \ - --output $DATASET_DIR/pancreas/denoised.h5ad - -# run one metric -viash run src/metrics/poisson/config.vsh.yaml -- \ - --input_denoised $DATASET_DIR/pancreas/denoised.h5ad \ - --input_test $DATASET_DIR/pancreas/test.h5ad \ - --output $DATASET_DIR/pancreas/score.h5ad - diff --git a/scripts/download_resources.sh b/scripts/download_resources.sh deleted file mode 100755 index c621323..0000000 --- a/scripts/download_resources.sh +++ /dev/null @@ -1,15 +0,0 @@ -#!/bin/bash - -set -e - -echo ">> Downloading resources" - -common/sync_resources/sync_resources \ - --input "s3://openproblems-data/resources_test/common/" \ - --output "resources_test/common" \ - --delete - -common/sync_resources/sync_resources \ - --input "s3://openproblems-data/resources_test/denoising/" \ - --output "resources_test/denoising" \ - --delete \ No newline at end of file diff --git a/scripts/project/build_all_components.sh b/scripts/project/build_all_components.sh new file mode 100755 index 0000000..4e90d91 --- /dev/null +++ b/scripts/project/build_all_components.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +set -e + +# Build all components in a namespace (refer https://viash.io/reference/cli/ns_build.html) +viash ns build --parallel diff --git a/scripts/project/build_all_docker_containers.sh b/scripts/project/build_all_docker_containers.sh new file mode 100755 index 0000000..5d43639 --- /dev/null +++ b/scripts/project/build_all_docker_containers.sh @@ -0,0 +1,7 @@ +#!/bin/bash + +set -e + +# Build all components in a namespace (refer https://viash.io/reference/cli/ns_build.html) +# and set up the container via a cached build +viash ns build --parallel --setup cachedbuild diff --git a/scripts/test_all_components.sh b/scripts/project/test_all_components.sh similarity index 75% rename from scripts/test_all_components.sh rename to scripts/project/test_all_components.sh index cd016e9..8a08afd 100755 --- a/scripts/test_all_components.sh +++ b/scripts/project/test_all_components.sh @@ -1,4 +1,6 @@ #!/bin/bash +set -e + # Test all components in a namespace (refer https://viash.io/reference/cli/ns_test.html) -viash ns test --parallel \ No newline at end of file +viash ns test --parallel diff --git a/scripts/run_benchmark/run_full_local.sh b/scripts/run_benchmark/run_full_local.sh new file mode 100755 index 0000000..da7f291 --- /dev/null +++ b/scripts/run_benchmark/run_full_local.sh @@ -0,0 +1,40 @@ +#!/bin/bash + +# get the root of the directory +REPO_ROOT=$(git rev-parse --show-toplevel) + +# ensure that the command below is run from the root of the repository +cd "$REPO_ROOT" + +# NOTE: depending on the the datasets and components, you may need to launch this workflow +# on a different compute platform (e.g. a HPC, AWS Cloud, Azure Cloud, Google Cloud). +# please refer to the nextflow information for more details: +# https://www.nextflow.io/docs/latest/ + + +set -e + +echo "Running benchmark on test data" +echo " Make sure to run 'scripts/project/build_all_docker_containers.sh'!" + +# generate a unique id +RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)" +publish_dir="resources/results/${RUN_ID}" + +# write the parameters to file +cat > /tmp/params.yaml << HERE +input_states: resources/datasets/**/state.yaml +rename_keys: 'input_train:output_train;input_test:output_test' +output_state: "state.yaml" +publish_dir: "$publish_dir" +HERE + +# run the benchmark +nextflow run openproblems-bio/task_template \ + --revision build/main \ + -main-script target/nextflow/workflows/run_benchmark/main.nf \ + -profile docker \ + -resume \ + -entry auto \ + -c common/nextflow_helpers/labels_ci.config \ + -params-file /tmp/params.yaml diff --git a/scripts/run_benchmark.sh b/scripts/run_benchmark/run_full_seqeracloud.sh similarity index 75% rename from scripts/run_benchmark.sh rename to scripts/run_benchmark/run_full_seqeracloud.sh index 73eb674..aa5a3c8 100755 --- a/scripts/run_benchmark.sh +++ b/scripts/run_benchmark/run_full_seqeracloud.sh @@ -1,11 +1,20 @@ #!/bin/bash +# get the root of the directory +REPO_ROOT=$(git rev-parse --show-toplevel) + +# ensure that the command below is run from the root of the repository +cd "$REPO_ROOT" + +set -e + +# generate a unique id RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)" publish_dir="s3://openproblems-data/resources/denoising/results/${RUN_ID}" -# make sure only log_cp10k is used +# write the parameters to file cat > /tmp/params.yaml << HERE -input_states: s3://openproblems-data/resources/denoising/datasets/**/log_cp10k/state.yaml +input_states: s3://openproblems-data/resources/denoising/datasets/**/state.yaml rename_keys: 'input_train:output_train;input_test:output_test' output_state: "state.yaml" publish_dir: "$publish_dir" diff --git a/scripts/run_benchmark/run_test_local.sh b/scripts/run_benchmark/run_test_local.sh new file mode 100755 index 0000000..a85bf75 --- /dev/null +++ b/scripts/run_benchmark/run_test_local.sh @@ -0,0 +1,27 @@ +#!/bin/bash + +# get the root of the directory +REPO_ROOT=$(git rev-parse --show-toplevel) + +# ensure that the command below is run from the root of the repository +cd "$REPO_ROOT" + +set -e + +echo "Running benchmark on test data" +echo " Make sure to run 'scripts/project/build_all_docker_containers.sh'!" + +# generate a unique id +RUN_ID="testrun_$(date +%Y-%m-%d_%H-%M-%S)" +publish_dir="temp/results/${RUN_ID}" + +nextflow run . \ + -main-script target/nextflow/workflows/run_benchmark/main.nf \ + -profile docker \ + -resume \ + -c common/nextflow_helpers/labels_ci.config \ + --id cxg_mouse_pancreas_atlas \ + --input_train resources_test/denoising/cxg_mouse_pancreas_atlas/train.h5ad \ + --input_test resources_test/denoising/cxg_mouse_pancreas_atlas/test.h5ad \ + --output_state state.yaml \ + --publish_dir "$publish_dir" diff --git a/scripts/run_benchmark/run_test_seqeracloud.sh b/scripts/run_benchmark/run_test_seqeracloud.sh new file mode 100755 index 0000000..428eda3 --- /dev/null +++ b/scripts/run_benchmark/run_test_seqeracloud.sh @@ -0,0 +1,31 @@ +#!/bin/bash + +# get the root of the directory +REPO_ROOT=$(git rev-parse --show-toplevel) + +# ensure that the command below is run from the root of the repository +cd "$REPO_ROOT" + +set -e + +resources_test_s3=s3://openproblems-data/resources_test/denoising +publish_dir_s3="s3://openproblems-nextflow/temp/results/denoising/$(date +%Y-%m-%d_%H-%M-%S)" + +# write the parameters to file +cat > /tmp/params.yaml << HERE +id: cxg_mouse_pancreas_atlas +input_train: $resources_test_s3/cxg_mouse_pancreas_atlas/train.h5ad +input_test: $resources_test_s3/cxg_mouse_pancreas_atlas/test.h5ad +output_state: "state.yaml" +publish_dir: $publish_dir_s3 +HERE + +tw launch https://github.com/openproblems-bio/task_denoising.git \ + --revision build/main \ + --pull-latest \ + --main-script target/nextflow/workflows/run_benchmark/main.nf \ + --workspace 53907369739130 \ + --compute-env 6TeIFgV5OY4pJCk8I0bfOh \ + --params-file /tmp/params.yaml \ + --config common/nextflow_helpers/labels_tw.config \ + --labels denoising,test diff --git a/scripts/run_benchmark_test.sh b/scripts/run_benchmark_test.sh deleted file mode 100755 index 9e4d01c..0000000 --- a/scripts/run_benchmark_test.sh +++ /dev/null @@ -1,19 +0,0 @@ -#!/bin/bash - -cat > /tmp/params.yaml << 'HERE' -input_states: s3://openproblems-data/resources_test/denoising/**/state.yaml -rename_keys: 'input_train:output_train;input_test:output_test' -output_state: "state.yaml" -publish_dir: s3://openproblems-nextflow/temp/denoising/ -HERE - -tw launch https://github.com/openproblems-bio/task_denoising.git \ - --revision build/main \ - --pull-latest \ - --main-script target/nextflow/workflows/run_benchmark/main.nf \ - --workspace 53907369739130 \ - --compute-env 6TeIFgV5OY4pJCk8I0bfOh \ - --params-file /tmp/params.yaml \ - --entry-name auto \ - --config common/nextflow_helpers/labels_tw.config \ - --labels denoising,test \ No newline at end of file diff --git a/scripts/sync_resources.sh b/scripts/sync_resources.sh new file mode 100755 index 0000000..20b87e7 --- /dev/null +++ b/scripts/sync_resources.sh @@ -0,0 +1,5 @@ +#!/bin/bash + +set -e + +common/scripts/sync_resources diff --git a/src/api/comp_control_method.yaml b/src/api/comp_control_method.yaml index 1cee82a..0378baa 100644 --- a/src/api/comp_control_method.yaml +++ b/src/api/comp_control_method.yaml @@ -12,11 +12,11 @@ info: in the task. arguments: - name: --input_train - __merge__: file_train_h5ad.yaml + __merge__: file_train.yaml required: true direction: input - name: --input_test - __merge__: file_test_h5ad.yaml + __merge__: file_test.yaml required: true direction: input - name: --output @@ -27,7 +27,7 @@ test_resources: - type: python_script path: /common/component_tests/run_and_check_output.py - type: python_script - path: /common/component_tests/check_method_config.py + path: /common/component_tests/check_config.py - path: /common/library.bib - path: /resources_test/denoising/pancreas dest: resources_test/denoising/pancreas \ No newline at end of file diff --git a/src/api/comp_process_dataset.yaml b/src/api/comp_data_processor.yaml similarity index 86% rename from src/api/comp_process_dataset.yaml rename to src/api/comp_data_processor.yaml index b5e7416..d3d24bb 100644 --- a/src/api/comp_process_dataset.yaml +++ b/src/api/comp_data_processor.yaml @@ -1,4 +1,4 @@ -namespace: "process_dataset" +namespace: "data_processors" info: type: process_dataset type_info: @@ -12,11 +12,11 @@ arguments: direction: input required: true - name: "--output_train" - __merge__: file_train_h5ad.yaml + __merge__: file_train.yaml direction: output required: true - name: "--output_test" - __merge__: file_test_h5ad.yaml + __merge__: file_test.yaml direction: output required: true test_resources: diff --git a/src/api/comp_method.yaml b/src/api/comp_method.yaml index 09fae19..ef04c12 100644 --- a/src/api/comp_method.yaml +++ b/src/api/comp_method.yaml @@ -8,7 +8,7 @@ info: A denoising method to remove noise (i.e. technical artifacts) from a dataset. arguments: - name: --input_train - __merge__: file_train_h5ad.yaml + __merge__: file_train.yaml required: true direction: input - name: --output @@ -19,7 +19,7 @@ test_resources: - type: python_script path: /common/component_tests/run_and_check_output.py - type: python_script - path: /common/component_tests/check_method_config.py + path: /common/component_tests/check_config.py - path: /common/library.bib - path: /resources_test/denoising/pancreas dest: resources_test/denoising/pancreas \ No newline at end of file diff --git a/src/api/comp_metric.yaml b/src/api/comp_metric.yaml index 83435ab..354d0f4 100644 --- a/src/api/comp_metric.yaml +++ b/src/api/comp_metric.yaml @@ -8,7 +8,7 @@ info: A metric for evaluating denoised datasets. arguments: - name: "--input_test" - __merge__: file_test_h5ad.yaml + __merge__: file_test.yaml direction: input required: true - name: "--input_prediction" @@ -21,7 +21,7 @@ arguments: required: true test_resources: - type: python_script - path: /common/component_tests/check_metric_config.py + path: /common/component_tests/check_config.py - type: python_script path: /common/component_tests/run_and_check_output.py - path: /common/library.bib diff --git a/src/api/file_common_dataset.yaml b/src/api/file_common_dataset.yaml index ff913ce..4e5db66 100644 --- a/src/api/file_common_dataset.yaml +++ b/src/api/file_common_dataset.yaml @@ -1,9 +1,10 @@ type: file example: "resources_test/common/pancreas/dataset.h5ad" +label: "Common Dataset" +summary: A subset of the common dataset. info: - label: "Common Dataset" - summary: A subset of the common dataset. - slots: + format: + type: h5ad layers: - type: integer name: counts diff --git a/src/api/file_prediction.yaml b/src/api/file_prediction.yaml index 788fa1a..e732d66 100644 --- a/src/api/file_prediction.yaml +++ b/src/api/file_prediction.yaml @@ -1,9 +1,10 @@ type: file example: "resources_test/denoising/pancreas/denoised.h5ad" +label: "Denoised data" +summary: A denoised dataset as output by a method. info: - label: "Denoised data" - summary: A denoised dataset as output by a method. - slots: + format: + type: h5ad layers: - type: integer name: denoised diff --git a/src/api/file_score.yaml b/src/api/file_score.yaml index 4a29744..3e80f6e 100644 --- a/src/api/file_score.yaml +++ b/src/api/file_score.yaml @@ -1,10 +1,10 @@ type: file example: resources_test/denoising/pancreas/score.h5ad +label: Score +summary: "File indicating the score of a metric." info: - label: Score - summary: "File indicating the score of a metric." - file_type: h5ad - slots: + format: + type: h5ad uns: - type: string name: dataset_id diff --git a/src/api/file_test_h5ad.yaml b/src/api/file_test.yaml similarity index 92% rename from src/api/file_test_h5ad.yaml rename to src/api/file_test.yaml index 371b305..10dab87 100644 --- a/src/api/file_test_h5ad.yaml +++ b/src/api/file_test.yaml @@ -1,9 +1,10 @@ type: file example: "resources_test/denoising/pancreas/test.h5ad" +label: "Test data" +summary: The subset of molecules used for the test dataset info: - label: "Test data" - summary: The subset of molecules used for the test dataset - slots: + format: + type: h5ad layers: - type: integer name: counts diff --git a/src/api/file_train_h5ad.yaml b/src/api/file_train.yaml similarity index 74% rename from src/api/file_train_h5ad.yaml rename to src/api/file_train.yaml index 302eae2..0d12edb 100644 --- a/src/api/file_train_h5ad.yaml +++ b/src/api/file_train.yaml @@ -1,9 +1,10 @@ type: file example: "resources_test/denoising/pancreas/train.h5ad" +label: "Training data" +summary: The subset of molecules used for the training dataset info: - label: "Training data" - summary: The subset of molecules used for the training dataset - slots: + format: + type: h5ad layers: - type: integer name: counts diff --git a/src/control_methods/no_denoising/config.vsh.yaml b/src/control_methods/no_denoising/config.vsh.yaml index c0364df..5f0272a 100644 --- a/src/control_methods/no_denoising/config.vsh.yaml +++ b/src/control_methods/no_denoising/config.vsh.yaml @@ -1,9 +1,9 @@ __merge__: ../../api/comp_control_method.yaml name: "no_denoising" +label: No Denoising +summary: "negative control by copying train counts" +description: "This method serves as a negative control, where the denoised data is a copy of the unaltered training data. This represents the scoring threshold if denoising was not performed on the data." info: - label: No Denoising - summary: "negative control by copying train counts" - description: "This method serves as a negative control, where the denoised data is a copy of the unaltered training data. This represents the scoring threshold if denoising was not performed on the data." v1: path: openproblems/tasks/denoising/methods/baseline.py commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32 diff --git a/src/control_methods/perfect_denoising/config.vsh.yaml b/src/control_methods/perfect_denoising/config.vsh.yaml index e4f235d..47c3c5d 100644 --- a/src/control_methods/perfect_denoising/config.vsh.yaml +++ b/src/control_methods/perfect_denoising/config.vsh.yaml @@ -1,10 +1,10 @@ __merge__: ../../api/comp_control_method.yaml name: "perfect_denoising" +label: Perfect Denoising +summary: "Positive control by copying the test counts" +description: "This method serves as a positive control, where the test data is copied 1-to-1 to the denoised data. This makes it seem as if the data is perfectly denoised as it will be compared to the test data in the metrics." info: - label: Perfect Denoising - summary: "Positive control by copying the test counts" - description: "This method serves as a positive control, where the test data is copied 1-to-1 to the denoised data. This makes it seem as if the data is perfectly denoised as it will be compared to the test data in the metrics." v1: path: openproblems/tasks/denoising/methods/baseline.py commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32 diff --git a/src/process_dataset/config.vsh.yaml b/src/data_processors/process_dataset/config.vsh.yaml similarity index 95% rename from src/process_dataset/config.vsh.yaml rename to src/data_processors/process_dataset/config.vsh.yaml index f167d39..3914251 100644 --- a/src/process_dataset/config.vsh.yaml +++ b/src/data_processors/process_dataset/config.vsh.yaml @@ -1,4 +1,4 @@ -__merge__: ../api/comp_process_dataset.yaml +__merge__: ../../api/comp_data_processor.yaml name: "process_dataset" description: | Split data using molecular cross-validation. diff --git a/src/process_dataset/helper.py b/src/data_processors/process_dataset/helper.py similarity index 100% rename from src/process_dataset/helper.py rename to src/data_processors/process_dataset/helper.py diff --git a/src/process_dataset/script.py b/src/data_processors/process_dataset/script.py similarity index 100% rename from src/process_dataset/script.py rename to src/data_processors/process_dataset/script.py diff --git a/src/methods/alra/config.vsh.yaml b/src/methods/alra/config.vsh.yaml index 4f956b4..7598429 100644 --- a/src/methods/alra/config.vsh.yaml +++ b/src/methods/alra/config.vsh.yaml @@ -1,21 +1,23 @@ __merge__: ../../api/comp_method.yaml name: "alra" +label: ALRA +summary: "ALRA imputes missing values in scRNA-seq data by computing rank-k approximation, thresholding by gene, and rescaling the matrix." +description: | + Adaptively-thresholded Low Rank Approximation (ALRA). + + ALRA is a method for imputation of missing values in single cell RNA-sequencing data, + described in the preprint, "Zero-preserving imputation of scRNA-seq data using low-rank approximation" + available [here](https://www.biorxiv.org/content/early/2018/08/22/397588). Given a + scRNA-seq expression matrix, ALRA first computes its rank-k approximation using randomized SVD. + Next, each row (gene) is thresholded by the magnitude of the most negative value of that gene. + Finally, the matrix is rescaled. +references: + doi: 10.1101/397588 +links: + documentation: https://github.com/KlugerLab/ALRA/blob/master/README.md + repository: https://github.com/KlugerLab/ALRA info: - label: ALRA - summary: "ALRA imputes missing values in scRNA-seq data by computing rank-k approximation, thresholding by gene, and rescaling the matrix." - description: | - Adaptively-thresholded Low Rank Approximation (ALRA). - - ALRA is a method for imputation of missing values in single cell RNA-sequencing data, - described in the preprint, "Zero-preserving imputation of scRNA-seq data using low-rank approximation" - available [here](https://www.biorxiv.org/content/early/2018/08/22/397588). Given a - scRNA-seq expression matrix, ALRA first computes its rank-k approximation using randomized SVD. - Next, each row (gene) is thresholded by the magnitude of the most negative value of that gene. - Finally, the matrix is rescaled. - reference: "linderman2018zero" - repository_url: "https://github.com/KlugerLab/ALRA" - documentation_url: https://github.com/KlugerLab/ALRA/blob/master/README.md v1: path: openproblems/tasks/denoising/methods/alra.py commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32 diff --git a/src/methods/dca/config.vsh.yaml b/src/methods/dca/config.vsh.yaml index 3d62968..343a032 100644 --- a/src/methods/dca/config.vsh.yaml +++ b/src/methods/dca/config.vsh.yaml @@ -1,16 +1,18 @@ __merge__: ../../api/comp_method.yaml name: "dca" -info: - label: DCA - summary: "A deep autoencoder with ZINB loss function to address the dropout effect in count data" - description: | - "Deep Count Autoencoder +label: DCA +summary: "A deep autoencoder with ZINB loss function to address the dropout effect in count data" +description: | + "Deep Count Autoencoder - Removes the dropout effect by taking the count structure, overdispersed nature and sparsity of the data into account - using a deep autoencoder with zero-inflated negative binomial (ZINB) loss function." - reference: "eraslan2019single" - documentation_url: "https://github.com/theislab/dca#readme" - repository_url: "https://github.com/theislab/dca" + Removes the dropout effect by taking the count structure, overdispersed nature and sparsity of the data into account + using a deep autoencoder with zero-inflated negative binomial (ZINB) loss function." +references: + doi: 10.1038/s41467-018-07931-2 +links: + documentation: "https://github.com/theislab/dca#readme" + repository: "https://github.com/theislab/dca" +info: v1: path: openproblems/tasks/denoising/methods/dca.py commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32 @@ -31,6 +33,9 @@ engines: setup: - type: apt packages: procps + - type: python + github: + - openproblems-bio/core#subdirectory=packages/python/openproblems - type: python packages: - anndata~=0.8.0 @@ -39,6 +44,7 @@ engines: - requests - jsonschema - "git+https://github.com/scottgigante-immunai/dca.git@patch-1" + - numpy<2 runners: - type: executable - type: nextflow diff --git a/src/methods/knn_smoothing/config.vsh.yaml b/src/methods/knn_smoothing/config.vsh.yaml index fd7aab1..d2a4e82 100644 --- a/src/methods/knn_smoothing/config.vsh.yaml +++ b/src/methods/knn_smoothing/config.vsh.yaml @@ -1,29 +1,32 @@ __merge__: ../../api/comp_method.yaml name: "knn_smoothing" +label: KNN Smoothing +summary: "Iterative kNN-smoothing denoises scRNA-seq data by iteratively increasing the size of neighbourhoods for smoothing until a maximum k value is reached." +description: "Iterative kNN-smoothing is a method to repair or denoise noisy scRNA-seq + expression matrices. Given a scRNA-seq expression matrix, KNN-smoothing first + applies initial normalisation and smoothing. Then, a chosen number of + principal components is used to calculate Euclidean distances between cells. + Minimally sized neighbourhoods are initially determined from these Euclidean + distances, and expression profiles are shared between neighbouring cells. + Then, the resultant smoothed matrix is used as input to the next step of + smoothing, where the size (k) of the considered neighbourhoods is increased, + leading to greater smoothing. This process continues until a chosen maximum k + value has been reached, at which point the iteratively smoothed object is + then optionally scaled to yield a final result." +references: + doi: 10.1101/217737 +links: + documentation: https://github.com/yanailab/knn-smoothing#readme + repository: https://github.com/yanailab/knn-smoothing info: - label: KNN Smoothing - summary: "Iterative kNN-smoothing denoises scRNA-seq data by iteratively increasing the size of neighbourhoods for smoothing until a maximum k value is reached." - description: "Iterative kNN-smoothing is a method to repair or denoise noisy scRNA-seq - expression matrices. Given a scRNA-seq expression matrix, KNN-smoothing first - applies initial normalisation and smoothing. Then, a chosen number of - principal components is used to calculate Euclidean distances between cells. - Minimally sized neighbourhoods are initially determined from these Euclidean - distances, and expression profiles are shared between neighbouring cells. - Then, the resultant smoothed matrix is used as input to the next step of - smoothing, where the size (k) of the considered neighbourhoods is increased, - leading to greater smoothing. This process continues until a chosen maximum k - value has been reached, at which point the iteratively smoothed object is - then optionally scaled to yield a final result." - reference: "wagner2018knearest" - documentation_url: "https://github.com/yanailab/knn-smoothing#readme" - repository_url: "https://github.com/yanailab/knn-smoothing" v1: path: openproblems/tasks/denoising/methods/knn_smoothing.py commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32 variants: knn_smoothing: preferred_normalization: counts + resources: - type: python_script path: script.py diff --git a/src/methods/magic/config.vsh.yaml b/src/methods/magic/config.vsh.yaml index 62b9c87..1bb3a94 100644 --- a/src/methods/magic/config.vsh.yaml +++ b/src/methods/magic/config.vsh.yaml @@ -1,22 +1,24 @@ __merge__: ../../api/comp_method.yaml name: "magic" +label: MAGIC +summary: "MAGIC imputes and denoises scRNA-seq data that is noisy or dropout-prone." +description: "MAGIC (Markov Affinity-based Graph Imputation of Cells) is a method for + imputation and denoising of noisy or dropout-prone single cell RNA-sequencing + data. Given a normalised scRNA-seq expression matrix, it first calculates + Euclidean distances between each pair of cells in the dataset, which is then + augmented using a Gaussian kernel (function) and row-normalised to give a + normalised affinity matrix. A t-step markov process is then calculated, by + powering this affinity matrix t times. Finally, the powered affinity matrix + is right-multiplied by the normalised data, causing the final imputed values + to take the value of a per-gene average weighted by the affinities of cells. + The resultant imputed matrix is then rescaled, to more closely match the + magnitude of measurements in the normalised (input) matrix." +references: + doi: 10.1016/j.cell.2018.05.061 +links: + documentation: https://github.com/KrishnaswamyLab/MAGIC#readme + repository: https://github.com/KrishnaswamyLab/MAGIC info: - label: MAGIC - summary: "MAGIC imputes and denoises scRNA-seq data that is noisy or dropout-prone." - description: "MAGIC (Markov Affinity-based Graph Imputation of Cells) is a method for - imputation and denoising of noisy or dropout-prone single cell RNA-sequencing - data. Given a normalised scRNA-seq expression matrix, it first calculates - Euclidean distances between each pair of cells in the dataset, which is then - augmented using a Gaussian kernel (function) and row-normalised to give a - normalised affinity matrix. A t-step markov process is then calculated, by - powering this affinity matrix t times. Finally, the powered affinity matrix - is right-multiplied by the normalised data, causing the final imputed values - to take the value of a per-gene average weighted by the affinities of cells. - The resultant imputed matrix is then rescaled, to more closely match the - magnitude of measurements in the normalised (input) matrix." - reference: "van2018recovering" - documentation_url: "https://github.com/KrishnaswamyLab/MAGIC#readme" - repository_url: "https://github.com/KrishnaswamyLab/MAGIC" v1: path: openproblems/tasks/denoising/methods/magic.py commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32 @@ -56,7 +58,7 @@ engines: image: openproblems/base_python:1.0.0 setup: - type: python - pip: [scprep, magic-impute, scipy, scikit-learn<1.2] + pip: [scprep, magic-impute, scipy, scikit-learn<1.2, numpy<2] runners: - type: executable - type: nextflow diff --git a/src/methods/saver/config.vsh.yaml b/src/methods/saver/config.vsh.yaml index 90717dc..3d07668 100644 --- a/src/methods/saver/config.vsh.yaml +++ b/src/methods/saver/config.vsh.yaml @@ -2,21 +2,23 @@ __merge__: ../../api/comp_method.yaml name: saver status: disabled +label: SAVER +summary: SAVER (Single-cell Analysis Via Expression Recovery) implements a regularized regression prediction and empirical Bayes method to recover the true gene expression profile. +description: | + SAVER takes advantage of gene-to-gene relationships to recover the true expression level of each gene in each cell, + removing technical variation while retaining biological variation across cells (https://github.com/mohuangx/SAVER). + SAVER uses a post-quality-control scRNA-seq dataset with UMI counts as input. SAVER assumes that the count of each + gene in each cell follows a Poisson-gamma mixture, also known as a negative binomial model. Instead of specifying + the gamma prior, we estimate the prior parameters in an empirical Bayes-like approach with a Poisson LASSO regression, + using the expression of other genes as predictors. Once the prior parameters are estimated, SAVER outputs the + posterior distribution of the true expression, which quantifies estimation uncertainty, and the posterior mean is + used as the SAVER recovered expression value. +references: + doi: 10.1038/s41592-018-0033-z +links: + documentation: https://mohuangx.github.io/SAVER/index.html + repository: https://github.com/mohuangx/SAVER info: - label: SAVER - summary: SAVER (Single-cell Analysis Via Expression Recovery) implements a regularized regression prediction and empirical Bayes method to recover the true gene expression profile. - description: | - SAVER takes advantage of gene-to-gene relationships to recover the true expression level of each gene in each cell, - removing technical variation while retaining biological variation across cells (https://github.com/mohuangx/SAVER). - SAVER uses a post-quality-control scRNA-seq dataset with UMI counts as input. SAVER assumes that the count of each - gene in each cell follows a Poisson-gamma mixture, also known as a negative binomial model. Instead of specifying - the gamma prior, we estimate the prior parameters in an empirical Bayes-like approach with a Poisson LASSO regression, - using the expression of other genes as predictors. Once the prior parameters are estimated, SAVER outputs the - posterior distribution of the true expression, which quantifies estimation uncertainty, and the posterior mean is - used as the SAVER recovered expression value. - reference: huang2018savergene - repository_url: https://github.com/mohuangx/SAVER - documentation_url: https://mohuangx.github.io/SAVER/index.html preferred_normalization: counts resources: - type: r_script diff --git a/src/metrics/mse/config.vsh.yaml b/src/metrics/mse/config.vsh.yaml index 9068716..94e800a 100644 --- a/src/metrics/mse/config.vsh.yaml +++ b/src/metrics/mse/config.vsh.yaml @@ -6,7 +6,8 @@ info: label: Mean-squared error summary: "The mean squared error between the denoised counts and the true counts." description: "The mean squared error between the denoised counts of the training dataset and the true counts of the test dataset after reweighing by the train/test ratio" - reference: batson2019molecular + references: + doi: 10.1101/786269 v1: path: openproblems/tasks/denoising/metrics/mse.py commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32 @@ -24,6 +25,7 @@ engines: pypi: - scikit-learn - scprep + - numpy<2 runners: - type: executable - type: nextflow diff --git a/src/metrics/poisson/config.vsh.yaml b/src/metrics/poisson/config.vsh.yaml index 9f8aab8..47742a7 100644 --- a/src/metrics/poisson/config.vsh.yaml +++ b/src/metrics/poisson/config.vsh.yaml @@ -6,7 +6,8 @@ info: label: Poisson Loss summary: "The Poisson log likelihood of the true counts observed in the distribution of denoised counts" description: "The Poisson log likelihood of observing the true counts of the test dataset given the distribution given in the denoised dataset." - reference: batson2019molecular + references: + doi: 10.1101/786269 v1: path: openproblems/tasks/denoising/metrics/poisson.py commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32 @@ -22,7 +23,8 @@ engines: setup: - type: python pypi: - - scprep + - scprep + - numpy<2 runners: - type: executable - type: nextflow diff --git a/src/workflows/process_datasets/config.vsh.yaml b/src/workflows/process_datasets/config.vsh.yaml index 22765f2..6041a5c 100644 --- a/src/workflows/process_datasets/config.vsh.yaml +++ b/src/workflows/process_datasets/config.vsh.yaml @@ -10,11 +10,11 @@ argument_groups: - name: Outputs arguments: - name: "--output_train" - __merge__: "/src/api/file_train_h5ad.yaml" + __merge__: "/src/api/file_train.yaml" direction: output required: true - name: "--output_test" - __merge__: "/src/api/file_test_h5ad.yaml" + __merge__: "/src/api/file_test.yaml" direction: output required: true resources: diff --git a/src/workflows/run_benchmark/config.vsh.yaml b/src/workflows/run_benchmark/config.vsh.yaml index 3d1b6bc..da35f2b 100644 --- a/src/workflows/run_benchmark/config.vsh.yaml +++ b/src/workflows/run_benchmark/config.vsh.yaml @@ -4,11 +4,11 @@ argument_groups: - name: Inputs arguments: - name: "--input_train" - __merge__: "/src/api/file_train_h5ad.yaml" + __merge__: "/src/api/file_train.yaml" required: true direction: input - name: "--input_test" - __merge__: "/src/api/file_test_h5ad.yaml" + __merge__: "/src/api/file_test.yaml" required: true direction: input - name: Outputs diff --git a/src/api/thumbnail.svg b/thumbnail.svg similarity index 100% rename from src/api/thumbnail.svg rename to thumbnail.svg