Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neurips2021/joint embedding migration #61

Open
wants to merge 46 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
0db12b5
add mask_dataset
KaiWaldrant Dec 14, 2022
bc99112
debug mask_dataset test
KaiWaldrant Dec 14, 2022
262a1ed
add masked anddata api
KaiWaldrant Dec 14, 2022
3f367e1
add random_embed negative control
KaiWaldrant Dec 14, 2022
861072e
update control_method api
KaiWaldrant Dec 14, 2022
8749e2a
add zeros_embed control
KaiWaldrant Dec 15, 2022
7c89329
add lmds method
KaiWaldrant Dec 15, 2022
0d29dd0
add mnn method
KaiWaldrant Dec 15, 2022
3c46c4d
add newwave method
KaiWaldrant Dec 15, 2022
4ddb315
add pca method
KaiWaldrant Dec 15, 2022
3ae1855
Add totalVI method
KaiWaldrant Dec 16, 2022
87f84cb
add umap method
KaiWaldrant Dec 16, 2022
7cc07bf
add metric ari
KaiWaldrant Dec 16, 2022
caff25d
update comp_metric
KaiWaldrant Dec 16, 2022
f7e0e0b
update ari metric
KaiWaldrant Dec 16, 2022
22c7f46
add asw_batch metric
KaiWaldrant Dec 16, 2022
d7e03de
add asw_label metric
KaiWaldrant Dec 16, 2022
1b47472
add cc_cons metric
KaiWaldrant Dec 16, 2022
ea82ca5
remove DI docker because of old anndata package
KaiWaldrant Jan 4, 2023
16ce776
add check_format metric
KaiWaldrant Jan 4, 2023
4bce62c
add graph connectivity metric
KaiWaldrant Jan 4, 2023
bdbdbfd
add latent mixing metric
KaiWaldrant Jan 4, 2023
5457a6c
add nmi metric
KaiWaldrant Jan 4, 2023
6d50fc4
add rfoob metric
KaiWaldrant Jan 4, 2023
82ae20e
add ti_cons metric
KaiWaldrant Jan 4, 2023
acfb631
add ti_cons_batch metric
KaiWaldrant Jan 4, 2023
71ae0e9
add metric unit test
KaiWaldrant Jan 5, 2023
ed38c11
add task_info.yaml
KaiWaldrant Jan 5, 2023
88419ae
Merge remote-tracking branch 'origin/main' into neurips2021/joint_emb…
KaiWaldrant Jan 6, 2023
b6d5bbd
create NF workflow
KaiWaldrant Jan 6, 2023
99b0524
update changelog
KaiWaldrant Jan 6, 2023
c8ae601
update changelog
KaiWaldrant Jan 6, 2023
10f75d4
fix typo in changelog
KaiWaldrant Jan 6, 2023
e0aef20
fix typo in changelog
KaiWaldrant Jan 6, 2023
8327637
convert sparse matrix to array
KaiWaldrant Jan 9, 2023
1b2dd90
use denormalized counts data
KaiWaldrant Jan 9, 2023
a8895dc
fix directive labels
KaiWaldrant Jan 13, 2023
a849f0b
update configs to align with v1 metadata
KaiWaldrant Jan 13, 2023
399a316
add readme
KaiWaldrant Jan 13, 2023
be3e175
update readme
KaiWaldrant Jan 13, 2023
4d749cd
Merge remote-tracking branch 'origin/main' into neurips2021/joint_emb…
KaiWaldrant Jan 16, 2023
78c0db7
Merge remote-tracking branch 'origin/main' into neurips2021/joint_emb…
KaiWaldrant Jan 24, 2023
0bce137
update task info and readme
KaiWaldrant Jan 24, 2023
e7abed3
update comp_metric
KaiWaldrant Jan 24, 2023
f781da5
resolve personal comments
KaiWaldrant Jan 25, 2023
9381228
Merge remote-tracking branch 'origin/main' into neurips2021/joint_emb…
KaiWaldrant Jan 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,3 +179,67 @@
* `metrics/rmse` should be removed because RMSE metrics don't really make sense here.

* `metrics/trustworthiness` should be removed because it is already included in `metrics/coranking`.


## Multi modality - Joint Embedding

### New functionality

* `api/anndata_*`: Created a file format specifications for the h5ad files throughout the pipeline.

* `api/comp_*`: Created an api definition for the mask, method and metric components.

* `mask_dataset`: Added a component for masking raw datasets into task-ready dataset objects.

* `resources_test/joint_embedding/pancreas` with `src/joint_embedding/resources_test_scripts/pancreas.sh`.

### neurips 2021 migration

* `control_methods/random_embed`: Migrated from neurips 2021. Extracted from baseline method `dummy_random`.

* `control_methods/zeros_embed`: Migrated from neurips 2021. Extracted from baseline method `dummy_zeros`.

* `methods/lmds`: Migrated from neurips 2021.

* `methods/mnn`: Migrated and adapted from neurips 2021.

* `methods/newwave`: Migrated and adapted from neurips 2021.

* `methods/pca`: Migrated from neurips 2021.

* `methods/totalvi`: Migrated from neurips 2021.

* `methods/umap`: Migrated from neurips 2021.

* `metrics/ari`: Migrated from neurips 2021.

* `metrics/asw_batch`: Migrated from neurips 2021.

* `metrics/asw_label`: Migrated from neurips 2021.

* `metrics/cc_cons`: Migrated from neurips 2021.

* `metrics/check_format`: Migrated from neurips 2021.

* `metrics/graph_connectivity`: Migrated from neurips 2021.

* `metrics/latent_mixing`: Migrated from neurips 2021.

* `metrics/nmi`: Migrated from neurips 2021.

* `metrics/rfoob`: Migrated from neurips 2021.

* `metrics/ti_cons`: Migrated from neurips 2021.

* `metrics/ti_cons_batch`: Migrated from neurips 2021.

### changes from neurips 2021

* Updated docker config from R script. Was using an old `anndata` package which was giving warnings

* stores the output from the methods in `.obsm["X_emb"]` instead of `.X` in the `anndata`

* `X_emb` data is stored as a `Sparse Matrix`


* updated configs to latest `viash`
23 changes: 23 additions & 0 deletions src/joint_embedding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Joint embedding

Structure of this task:

src/embedding
├── api Interface specifications for components and datasets in this task
├── control_methods Baseline (random/ground truth) methods to compare methods against
├── methods Methods to be benchmarked
├── metrics Metrics used to quantify performance of methods
├── README.md This file
├── resources_scripts Scripts to process the datasets
├── resources_test_scripts Scripts to process the test resources
├── split_dataset Component to prepare common datasets
└── workflows Pipelines to run the full benchmark

Relevant links:

* [Description and results at openproblems.bio](https://openproblems.bio/neurips_2021/)

* [Experimental results](https://openproblems-experimental.netlify.app/results/joint_embedding/)

<!-- update this to openproblems.bio/guide when possible -->
* [Contribution guide](https://github.com/openproblems-bio/openproblems-v2/blob/main/CONTRIBUTING.md)
75 changes: 75 additions & 0 deletions src/joint_embedding/api/anndata_dataset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
type: file
description: "A raw dataset"
example: "dataset.h5ad"
info:
label: "Dataset"
slots:
layers:
- type: integer
name: counts
description: Raw counts
required: true
obs:
- type: string
name: batch
description: Batch information
required: true
- type: double
name: size_factors
description: The size factors created by the normalisation method, if any.
required: false
- type: string
name: cell_type
description: Type of cells
required: false
- type: string
name: pseudotime_order_GEX
description:
required: false
- type: string
name: pseudotime_order_ATAC
description:
required: false
- type: string
name: pseudotime_order_ADT
description:
required: false
- type: double
name: S_score
description:
required: false
- type: double
name: G2M_score
description:
required: false
- type: boolean
name: is_train
description: if sample is train data
required: true
var:
- type: string
name: gene_ids
description:
required: false
- type: string
name: feature_types
description:
required: true
uns:
- type: string
name: dataset_id
description: "A unique identifier for the dataset"
required: true
- type: string
name: organism
description: "data from which organism "
required: false
- type: string
name: gene_activity_var_names
description:
required: false
- type: string
name: sample_pm_varnames
description:
required: false

37 changes: 37 additions & 0 deletions src/joint_embedding/api/anndata_masked_mod1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
type: file
description: "The masked data"
example: "masked.h5ad"
info:
short_description: "masked data"
slots:
layers:
- type: integer
name: counts
description: Raw counts
obs:
- type: string
name: batch
description: Batch information
required: true
- type: double
name: size_factors
description:
required: false
var:
- type: string
name: feature_types
description:
required: true
- type: string
name: gene_ids
description:
required: false
uns:
- type: string
name: dataset_id
description: "A unique identifier for the dataset"
required: true
- type: string
name: organism
description: which organism
required: true
39 changes: 39 additions & 0 deletions src/joint_embedding/api/anndata_masked_mod2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
type: file
description: "The masked data for mod2 file"
example: "masked.h5ad"
info:
short_description: "Masked data"
slots:
layers:
- type: integer
name: counts
description: Raw counts
required: true
obs:
- type: string
name: batch
description: Batch information
required: true
var:
- type: string
name: feature_types
description:
required: true
- type: string
name: gene_ids
description:
required: false
obsm:
- type: double
name: gene_activity
description:
required: false
uns:
- type: string
name: dataset_id
description: "A unique identifier for the dataset"
required: true
- type: string
name: organism
description: which organism
required: true
25 changes: 25 additions & 0 deletions src/joint_embedding/api/anndata_prediction.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
type: file
description: "The prediction file"
example: "prediction.h5ad"
info:
short_description: "Prediction"
slots:
obs:
- type: string
name: batch
description: Batch information
required: true
obsm:
- type: double
name: X_emb
description:
required: true
uns:
- type: string
name: dataset_id
description: "A unique identifier for the dataset"
required: true
- type: string
name: method_id
description: "A unique identifier for the method"
required: true
25 changes: 25 additions & 0 deletions src/joint_embedding/api/anndata_score.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
type: file
description: "Metric score file"
example: "output.h5ad"
info:
short_description: "Score"
slots:
uns:
- type: string
name: dataset_id
description: "A unique identifier for the dataset"
required: true
- type: string
name: method_id
description: "A unique identifier for the method"
required: true
- type: string
name: metric_ids
description: "One or more unique metric identifiers"
multiple: true
required: true
- type: double
name: metric_values
description: "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'."
multiple: true
required: true
57 changes: 57 additions & 0 deletions src/joint_embedding/api/anndata_solution.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
type: file
description: "The solution for the data"
example: "solution.h5ad"
info:
short_description: "Solution"
slots:
layers:
- type: integer
name: counts
description: Raw counts
obs:
- type: string
name: batch
description: Batch information
required: false
- type: string
name: cell_type
description: Type of cells
required: false
- type: string
name: pseudotime_order_GEX
description:
required: false
- type: string
name: pseudotime_order_ATAC
description:
required: false
- type: string
name: pseudotime_order_ADT
description:
required: false
- type: double
name: S_score
description:
required: false
- type: double
name: G2M_score
description:
required: false
var:
- type: string
name: feature_types
description:
required: true
- type: string
name: gene_ids
description:
required: false
uns:
- type: string
name: dataset_id
description: "A unique identifier for the dataset"
required: true
- type: string
name: organism
description: which organism
required: true
24 changes: 24 additions & 0 deletions src/joint_embedding/api/authors.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
functionality:
authors:
- name: Robrecht Cannoodt
roles: [ author ]
props: { github: rcannood, orcid: "0000-0003-3641-729X" }
- name: Kai Waldrant
roles: [ contributor ]
props: { github: KaiWaldrant }
- name: Alex Tong
email: alexandertongdev@gmail.com
roles: [ author, maintainer ]
props: { github: atong01 }
- name: Christopher Lance
email: clance.connect@gmail.com
roles: [ author, maintainer ]
props: { github: xlancelottx }
- name: Michaela Mueller
email: mumichae@in.tum.de
roles: [ author, maintainer ]
props: { github: mumichae, orcid: "0000-0002-1401-1785" }
- name: Ann Chen
email: ann.chen@czbiohub.org
roles: [ author, maintainer ]
props: { github: atchen}
Loading