Skip to content

Commit

Permalink
update viash version (#13)
Browse files Browse the repository at this point in the history
* update submodule

* update viash version

* relocate thumbnail

* update methods metadata

* update control_methods

* add file_type

* remove obs layer

* Update common resources

* Update file* API

* update README

* fix component test fp

* update metrics references api

* add openproblems package to DCA

* update DCA method

* update submodule

* update dca

* update links

* set numpy<2

* update fapi file name

* update create_readme script

* update readme

* relocate process datasets

* Update scripts dir

* update changelog

* update readme

* update process_datasets merge path

* fix processor config
  • Loading branch information
KaiWaldrant authored Sep 19, 2024
1 parent 1f54509 commit 77fa24b
Show file tree
Hide file tree
Showing 44 changed files with 420 additions and 365 deletions.
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,18 @@

* Directory structure has been updated.

* Update to viash 0.9.0 (PR #13).

## NEW FUNCTIONALITY

* Add `CHANGELOG.md` (PR #7).

## MAJOR CHANGES

* Revamp `scripts` directory (PR #13).

* Relocated `process_datasets` to `data_processors/process_datasets` (PR #13).

## MINOR CHANGES

* Remove dtype parameter in `.Anndata()` (PR #6).
Expand All @@ -20,6 +28,11 @@

* Update docker containers used in components (PR #12).

* Set `numpy<2` for some failing methods (PR #13).

* Small changes to api file names (PR #13).


## transfer from openproblems-v2 repository

### NEW FUNCTIONALITY
Expand Down
181 changes: 51 additions & 130 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,75 +8,8 @@ Do not edit this file directly.

Removing noise in sparse single-cell RNA-sequencing count data

Path to source:
[`src`](https://github.com/openproblems-bio/task_denoising/src)

## README

## Installation

You need to have Docker, Java, and Viash installed. Follow [these
instructions](https://openproblems.bio/documentation/fundamentals/requirements)
to install the required dependencies.

## Add a method

To add a method to the repository, follow the instructions in the
`scripts/add_a_method.sh` script.

## Frequently used commands

To get started, you can run the following commands:

``` bash
git clone git@github.com:openproblems-bio/task_denoising.git

cd task_denoising

# initialise submodule
scripts/init_submodule.sh

# download resources
scripts/download_resources.sh
```

To run the benchmark, you first need to build the components.
Afterwards, you can run the benchmark:

``` bash
viash ns build --parallel --setup cachedbuild

scripts/run_benchmark.sh
```

After adding a component, it is recommended to run the tests to ensure
that the component is working correctly:

``` bash
viash ns test --parallel
```

Optionally, you can provide the `--query` argument to test only a subset
of components:

``` bash
viash ns test --parallel --query 'component_name'
```

## Motivation

Single-cell RNA-Seq protocols only detect a fraction of the mRNA
molecules present in each cell. As a result, the measurements (UMI
counts) observed for each gene and each cell are associated with
generally high levels of technical noise ([Grün et al.,
2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes
the task of estimating the true expression level of each gene in each
cell. In the single-cell literature, this task is also referred to as
*imputation*, a term which is typically used for missing data problems
in statistics. Similar to the use of the terms “dropout”, “missing
data”, and “technical zeros”, this terminology can create confusion
about the underlying measurement process ([Sarkar and Stephens,
2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)).
Repository:
[openproblems-bio/task_denoising](https://github.com/openproblems-bio/task_denoising)

## Description

Expand Down Expand Up @@ -114,24 +47,24 @@ dataset.
``` mermaid
flowchart LR
file_common_dataset("Common Dataset")
comp_process_dataset[/"Data processor"/]
file_train_h5ad("Training data")
file_test_h5ad("Test data")
comp_data_processor[/"Data processor"/]
file_test("Test data")
file_train("Training data")
comp_control_method[/"Control Method"/]
comp_method[/"Method"/]
comp_metric[/"Metric"/]
comp_method[/"Method"/]
file_prediction("Denoised data")
file_score("Score")
file_common_dataset---comp_process_dataset
comp_process_dataset-->file_train_h5ad
comp_process_dataset-->file_test_h5ad
file_train_h5ad---comp_control_method
file_train_h5ad---comp_method
file_test_h5ad---comp_control_method
file_test_h5ad---comp_metric
file_common_dataset---comp_data_processor
comp_data_processor-->file_test
comp_data_processor-->file_train
file_test---comp_control_method
file_test---comp_metric
file_train---comp_control_method
file_train---comp_method
comp_control_method-->file_prediction
comp_method-->file_prediction
comp_metric-->file_score
comp_method-->file_prediction
file_prediction---comp_metric
```

Expand All @@ -151,7 +84,7 @@ Format:

</div>

Slot description:
Data structure:

<div class="small">

Expand All @@ -170,9 +103,6 @@ Slot description:

## Component type: Data processor

Path:
[`src/process_dataset`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/process_dataset)

A denoising dataset processor.

Arguments:
Expand All @@ -187,72 +117,69 @@ Arguments:

</div>

## File format: Training data
## File format: Test data

The subset of molecules used for the training dataset
The subset of molecules used for the test dataset

Example file: `resources_test/denoising/pancreas/train.h5ad`
Example file: `resources_test/denoising/pancreas/test.h5ad`

Format:

<div class="small">

AnnData object
layers: 'counts'
uns: 'dataset_id'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'train_sum'

</div>

Slot description:
Data structure:

<div class="small">

| Slot | Type | Description |
|:--------------------|:----------|:-------------------------------------|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| Slot | Type | Description |
|:---|:---|:---|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
| `uns["dataset_description"]` | `string` | Long description of the dataset. |
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
| `uns["train_sum"]` | `integer` | The total number of counts in the training dataset. |

</div>

## File format: Test data
## File format: Training data

The subset of molecules used for the test dataset
The subset of molecules used for the training dataset

Example file: `resources_test/denoising/pancreas/test.h5ad`
Example file: `resources_test/denoising/pancreas/train.h5ad`

Format:

<div class="small">

AnnData object
layers: 'counts'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'train_sum'
uns: 'dataset_id'

</div>

Slot description:
Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
| `uns["dataset_description"]` | `string` | Long description of the dataset. |
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
| `uns["train_sum"]` | `integer` | The total number of counts in the training dataset. |
| Slot | Type | Description |
|:--------------------|:----------|:-------------------------------------|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |

</div>

## Component type: Control Method

Path:
[`src/control_methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/control_methods)

A control method.

Arguments:
Expand All @@ -267,40 +194,34 @@ Arguments:

</div>

## Component type: Method

Path:
[`src/methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/methods)
## Component type: Metric

A method.
A metric.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--input_train` | `file` | The subset of molecules used for the training dataset. |
| `--output` | `file` | (*Output*) A denoised dataset as output by a method. |
| `--input_test` | `file` | The subset of molecules used for the test dataset. |
| `--input_prediction` | `file` | A denoised dataset as output by a method. |
| `--output` | `file` | (*Output*) File indicating the score of a metric. |

</div>

## Component type: Metric

Path:
[`src/metrics`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/metrics)
## Component type: Method

A metric.
A method.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--input_test` | `file` | The subset of molecules used for the test dataset. |
| `--input_prediction` | `file` | A denoised dataset as output by a method. |
| `--output` | `file` | (*Output*) File indicating the score of a metric. |
| `--input_train` | `file` | The subset of molecules used for the training dataset. |
| `--output` | `file` | (*Output*) A denoised dataset as output by a method. |

</div>

Expand All @@ -320,7 +241,7 @@ Format:

</div>

Slot description:
Data structure:

<div class="small">

Expand All @@ -347,7 +268,7 @@ Format:

</div>

Slot description:
Data structure:

<div class="small">

Expand Down
Loading

0 comments on commit 77fa24b

Please sign in to comment.