update viash version (#13)

* update submodule * update viash version * relocate thumbnail * update methods metadata * update control_methods * add file_type * remove obs layer * Update common resources * Update file* API * update README * fix component test fp * update metrics references api * add openproblems package to DCA * update DCA method * update submodule * update dca * update links * set numpy<2 * update fapi file name * update create_readme script * update readme * relocate process datasets * Update scripts dir * update changelog * update readme * update process_datasets merge path * fix processor config
openproblems-bio · Sep 19, 2024 · 77fa24b · 77fa24b
1 parent 1f54509
commit 77fa24b
Show file tree

Hide file tree

Showing 44 changed files with 420 additions and 365 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,10 +6,18 @@
 
 * Directory structure has been updated.
 
+* Update to viash 0.9.0 (PR #13).
+
 ## NEW FUNCTIONALITY
 
 * Add `CHANGELOG.md` (PR #7).
 
+## MAJOR CHANGES
+
+* Revamp `scripts` directory (PR #13).
+
+* Relocated `process_datasets` to `data_processors/process_datasets` (PR #13).
+
 ## MINOR CHANGES
 
 * Remove dtype parameter in `.Anndata()` (PR #6).
@@ -20,6 +28,11 @@
 
 * Update docker containers used in components (PR #12).
 
+* Set `numpy<2` for some failing methods (PR #13).
+
+* Small changes to api file names (PR #13).
+
+
 ## transfer from openproblems-v2 repository
 
 ### NEW FUNCTIONALITY

diff --git a/README.md b/README.md
@@ -8,75 +8,8 @@ Do not edit this file directly.
 
 Removing noise in sparse single-cell RNA-sequencing count data
 
-Path to source:
-[`src`](https://github.com/openproblems-bio/task_denoising/src)
-
-## README
-
-## Installation
-
-You need to have Docker, Java, and Viash installed. Follow [these
-instructions](https://openproblems.bio/documentation/fundamentals/requirements)
-to install the required dependencies.
-
-## Add a method
-
-To add a method to the repository, follow the instructions in the
-`scripts/add_a_method.sh` script.
-
-## Frequently used commands
-
-To get started, you can run the following commands:
-
-``` bash
-git clone git@github.com:openproblems-bio/task_denoising.git
-
-cd task_denoising
-
-# initialise submodule
-scripts/init_submodule.sh
-
-# download resources
-scripts/download_resources.sh
-```
-
-To run the benchmark, you first need to build the components.
-Afterwards, you can run the benchmark:
-
-``` bash
-viash ns build --parallel --setup cachedbuild
-
-scripts/run_benchmark.sh
-```
-
-After adding a component, it is recommended to run the tests to ensure
-that the component is working correctly:
-
-``` bash
-viash ns test --parallel
-```
-
-Optionally, you can provide the `--query` argument to test only a subset
-of components:
-
-``` bash
-viash ns test --parallel --query 'component_name'
-```
-
-## Motivation
-
-Single-cell RNA-Seq protocols only detect a fraction of the mRNA
-molecules present in each cell. As a result, the measurements (UMI
-counts) observed for each gene and each cell are associated with
-generally high levels of technical noise ([Grün et al.,
-2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes
-the task of estimating the true expression level of each gene in each
-cell. In the single-cell literature, this task is also referred to as
-*imputation*, a term which is typically used for missing data problems
-in statistics. Similar to the use of the terms “dropout”, “missing
-data”, and “technical zeros”, this terminology can create confusion
-about the underlying measurement process ([Sarkar and Stephens,
-2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)).
+Repository:
+[openproblems-bio/task_denoising](https://github.com/openproblems-bio/task_denoising)
 
 ## Description
 
@@ -114,24 +47,24 @@ dataset.
 ``` mermaid
 flowchart LR
   file_common_dataset("Common Dataset")
-  comp_process_dataset[/"Data processor"/]
-  file_train_h5ad("Training data")
-  file_test_h5ad("Test data")
+  comp_data_processor[/"Data processor"/]
+  file_test("Test data")
+  file_train("Training data")
   comp_control_method[/"Control Method"/]
-  comp_method[/"Method"/]
   comp_metric[/"Metric"/]
+  comp_method[/"Method"/]
   file_prediction("Denoised data")
   file_score("Score")
-  file_common_dataset---comp_process_dataset
-  comp_process_dataset-->file_train_h5ad
-  comp_process_dataset-->file_test_h5ad
-  file_train_h5ad---comp_control_method
-  file_train_h5ad---comp_method
-  file_test_h5ad---comp_control_method
-  file_test_h5ad---comp_metric
+  file_common_dataset---comp_data_processor
+  comp_data_processor-->file_test
+  comp_data_processor-->file_train
+  file_test---comp_control_method
+  file_test---comp_metric
+  file_train---comp_control_method
+  file_train---comp_method
   comp_control_method-->file_prediction
-  comp_method-->file_prediction
   comp_metric-->file_score
+  comp_method-->file_prediction
   file_prediction---comp_metric
 ```
 
@@ -151,7 +84,7 @@ Format:
 
 </div>
 
-Slot description:
+Data structure:
 
 <div class="small">
 
@@ -170,9 +103,6 @@ Slot description:
 
 ## Component type: Data processor
 
-Path:
-[`src/process_dataset`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/process_dataset)
-
 A denoising dataset processor.
 
 Arguments:
@@ -187,72 +117,69 @@ Arguments:
 
 </div>
 
-## File format: Training data
+## File format: Test data
 
-The subset of molecules used for the training dataset
+The subset of molecules used for the test dataset
 
-Example file: `resources_test/denoising/pancreas/train.h5ad`
+Example file: `resources_test/denoising/pancreas/test.h5ad`
 
 Format:
 
 <div class="small">
 
     AnnData object
      layers: 'counts'
-     uns: 'dataset_id'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'train_sum'
 
 </div>
 
-Slot description:
+Data structure:
 
 <div class="small">
 
-| Slot                | Type      | Description                          |
-|:--------------------|:----------|:-------------------------------------|
-| `layers["counts"]`  | `integer` | Raw counts.                          |
-| `uns["dataset_id"]` | `string`  | A unique identifier for the dataset. |
+| Slot | Type | Description |
+|:---|:---|:---|
+| `layers["counts"]` | `integer` | Raw counts. |
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| `uns["dataset_name"]` | `string` | Nicely formatted name. |
+| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
+| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
+| `uns["dataset_description"]` | `string` | Long description of the dataset. |
+| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
+| `uns["train_sum"]` | `integer` | The total number of counts in the training dataset. |
 
 </div>
 
-## File format: Test data
+## File format: Training data
 
-The subset of molecules used for the test dataset
+The subset of molecules used for the training dataset
 
-Example file: `resources_test/denoising/pancreas/test.h5ad`
+Example file: `resources_test/denoising/pancreas/train.h5ad`
 
 Format:
 
 <div class="small">
 
     AnnData object
      layers: 'counts'
-     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'train_sum'
+     uns: 'dataset_id'
 
 </div>
 
-Slot description:
+Data structure:
 
 <div class="small">
 
-| Slot | Type | Description |
-|:---|:---|:---|
-| `layers["counts"]` | `integer` | Raw counts. |
-| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
-| `uns["dataset_name"]` | `string` | Nicely formatted name. |
-| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
-| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
-| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
-| `uns["dataset_description"]` | `string` | Long description of the dataset. |
-| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
-| `uns["train_sum"]` | `integer` | The total number of counts in the training dataset. |
+| Slot                | Type      | Description                          |
+|:--------------------|:----------|:-------------------------------------|
+| `layers["counts"]`  | `integer` | Raw counts.                          |
+| `uns["dataset_id"]` | `string`  | A unique identifier for the dataset. |
 
 </div>
 
 ## Component type: Control Method
 
-Path:
-[`src/control_methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/control_methods)
-
 A control method.
 
 Arguments:
@@ -267,40 +194,34 @@ Arguments:
 
 </div>
 
-## Component type: Method
-
-Path:
-[`src/methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/methods)
+## Component type: Metric
 
-A method.
+A metric.
 
 Arguments:
 
 <div class="small">
 
 | Name | Type | Description |
 |:---|:---|:---|
-| `--input_train` | `file` | The subset of molecules used for the training dataset. |
-| `--output` | `file` | (*Output*) A denoised dataset as output by a method. |
+| `--input_test` | `file` | The subset of molecules used for the test dataset. |
+| `--input_prediction` | `file` | A denoised dataset as output by a method. |
+| `--output` | `file` | (*Output*) File indicating the score of a metric. |
 
 </div>
 
-## Component type: Metric
-
-Path:
-[`src/metrics`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/metrics)
+## Component type: Method
 
-A metric.
+A method.
 
 Arguments:
 
 <div class="small">
 
 | Name | Type | Description |
 |:---|:---|:---|
-| `--input_test` | `file` | The subset of molecules used for the test dataset. |
-| `--input_prediction` | `file` | A denoised dataset as output by a method. |
-| `--output` | `file` | (*Output*) File indicating the score of a metric. |
+| `--input_train` | `file` | The subset of molecules used for the training dataset. |
+| `--output` | `file` | (*Output*) A denoised dataset as output by a method. |
 
 </div>
 
@@ -320,7 +241,7 @@ Format:
 
 </div>
 
-Slot description:
+Data structure:
 
 <div class="small">
 
@@ -347,7 +268,7 @@ Format:
 
 </div>
 
-Slot description:
+Data structure:
 
 <div class="small">