Skip to content

Commit

Permalink
Rename task-dge-perturbation-prediction to `task_perturbation_predi…
Browse files Browse the repository at this point in the history
…ction` (openproblems-bio#66)

* update task info

* update project config

* rename task

* move files

* add namespace to api

* update readme

* update spec

* simplify component
  • Loading branch information
rcannood authored Jun 4, 2024
1 parent ca5de78 commit 8a61381
Show file tree
Hide file tree
Showing 103 changed files with 272 additions and 200 deletions.
12 changes: 6 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# task-dge-perturbation-prediction 0.1.0
# task_perturbation_prediction 1.0.0

Initial release of the DGE Perturbation Prediction task. Initial components:
Initial release of the Perturbation Prediction task. Initial components:

* `src/task/process_dataset`: Compute the DGE data from the raw single-cell counts using Limma.
* `src/task/control_methods`: Baseline control methods: sample, ground_truth, zeros, mean_across_celltypes, mean_across_compounds, mean_outcome.
* `src/task/methods`: DGE perturbation prediction methods: random_forest.
* `src/task/metrics`: Evaluation metrics: mean_rowwise_error.
* `src/process_dataset`: Compute the DGE data from the raw single-cell counts using Limma.
* `src/control_methods`: Baseline control methods: sample, ground_truth, zeros, mean_across_celltypes, mean_across_compounds, mean_outcome.
* `src/methods`: Perturbation prediction methods: jn_ap_op2, lgc_ensemble, nn_retraining_with_pseudolabels, pyboost, scape, transformer_ensemble.
* `src/metrics`: Evaluation metrics: mean_rowwise_error, mean_rowwise_correlation.


287 changes: 163 additions & 124 deletions README.md

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions _viash.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ config_mods: |
.functionality.version := 'dev'
.functionality.arguments[.multiple == true].multiple_sep := ';'
.platforms[.type == 'docker'].target_registry := 'ghcr.io'
.platforms[.type == 'docker'].target_organization := 'openproblems-bio/task-dge-perturbation-prediction'
.platforms[.type == 'docker'].target_image_source := 'https://github.com/openproblems-bio/task-dge-perturbation-prediction'
.platforms[.type == 'docker'].target_organization := 'openproblems-bio/task_perturbation_prediction'
.platforms[.type == 'docker'].target_image_source := 'https://github.com/openproblems-bio/task_perturbation_prediction'
.platforms[.type == "nextflow"].directives.tag := "$id"
.platforms[.type == "nextflow"].auto.simplifyOutput := false
.platforms[.type == "nextflow"].config.labels := { lowmem : "memory = 20.Gb", midmem : "memory = 50.Gb", highmem : "memory = 100.Gb", lowcpu : "cpus = 5", midcpu : "cpus = 15", highcpu : "cpus = 30", lowtime : "time = 1.h", midtime : "time = 4.h", hightime : "time = 8.h", veryhightime : "time = 24.h" }
Expand Down
12 changes: 6 additions & 6 deletions scripts/add_a_method.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,25 @@ viash run src/common/create_component/config.vsh.yaml -- \
--language "$method_lang" \
--name "$method_id"

# TODO: fill in required fields in src/task/methods/foo/config.vsh.yaml
# TODO: edit src/task/methods/foo/script.py/R
# TODO: fill in required fields in src/methods/foo/config.vsh.yaml
# TODO: edit src/methods/foo/script.py/R

# test the component
viash test src/task/methods/$method_id/config.vsh.yaml
viash test src/methods/$method_id/config.vsh.yaml

# rebuild the container (only if you change something to the docker platform)
# You can reduce the memory and cpu allotted to jobs in _viash.yaml by modifying .platforms[.type == "nextflow"].config.labels
viash run src/task/methods/$method_id/config.vsh.yaml -- \
viash run src/methods/$method_id/config.vsh.yaml -- \
---setup cachedbuild ---verbose

# run the method (using h5ad as input)
viash run src/task/methods/$method_id/config.vsh.yaml -- \
viash run src/methods/$method_id/config.vsh.yaml -- \
--de_train_h5ad "resources/neurips-2023-kaggle/2023-09-12_de_by_cell_type_train.h5ad" \
--id_map "resources/neurips-2023-kaggle/id_map.csv" \
--output "output/prediction.h5ad"

# run evaluation metric
viash run src/task/metrics/mean_rowwise_error/config.vsh.yaml -- \
viash run src/metrics/mean_rowwise_error/config.vsh.yaml -- \
--de_test_h5ad "resources/neurips-2023-kaggle/de_test.h5ad" \
--prediction "output/prediction.h5ad" \
--output "output/score.h5ad"
Expand Down
6 changes: 3 additions & 3 deletions scripts/generate_kaggle_resources.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ if [[ ! -f "$OUT/2023-09-12_de_by_cell_type_test.h5ad" ]]; then
"import anndata as ad; ad.read_h5ad('$OUT/2023-09-12_de_by_cell_type_train.h5ad').write_h5ad('$OUT/2023-09-12_de_by_cell_type_train.h5ad', compression='gzip')"
fi

viash run src/task/process_dataset/convert_kaggle_h5ad_to_parquet/config.vsh.yaml -- \
viash run src/process_dataset/convert_kaggle_h5ad_to_parquet/config.vsh.yaml -- \
--input_train "$OUT/2023-09-12_de_by_cell_type_train.h5ad" \
--input_test "$OUT/2023-09-12_de_by_cell_type_test.h5ad" \
--input_single_cell_h5ad "resources/neurips-2023-raw/sc_counts.h5ad" \
Expand All @@ -34,14 +34,14 @@ viash run src/task/process_dataset/convert_kaggle_h5ad_to_parquet/config.vsh.yam
--dataset_organism homo_sapiens

echo ">> Run method"
viash run src/task/control_methods/mean_across_compounds/config.vsh.yaml -- \
viash run src/control_methods/mean_across_compounds/config.vsh.yaml -- \
--de_train_h5ad "$OUT/de_train.h5ad" \
--de_test_h5ad "$OUT/de_test.h5ad" \
--id_map "$OUT/id_map.csv" \
--output "$OUT/prediction.h5ad"

echo ">> Run metric"
viash run src/task/metrics/mean_rowwise_error/config.vsh.yaml -- \
viash run src/metrics/mean_rowwise_error/config.vsh.yaml -- \
--prediction "$OUT/prediction.h5ad" \
--de_test_h5ad "$OUT/de_test.h5ad" \
--output "$OUT/score.h5ad"
Expand Down
4 changes: 2 additions & 2 deletions scripts/generate_resources.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,14 @@ nextflow run \
--publish_dir "$OUT"

echo ">> Run method"
viash run src/task/control_methods/mean_across_compounds/config.vsh.yaml -- \
viash run src/control_methods/mean_across_compounds/config.vsh.yaml -- \
--de_train_h5ad "$OUT/de_train.h5ad" \
--de_test_h5ad "$OUT/de_test.h5ad" \
--id_map "$OUT/id_map.csv" \
--output "$OUT/prediction.h5ad"

echo ">> Run metric"
viash run src/task/metrics/mean_rowwise_error/config.vsh.yaml -- \
viash run src/metrics/mean_rowwise_error/config.vsh.yaml -- \
--prediction "$OUT/prediction.h5ad" \
--de_test_h5ad "$OUT/de_test.h5ad" \
--output "$OUT/score.h5ad"
Expand Down
6 changes: 3 additions & 3 deletions scripts/render_readme.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ set -e
[[ ! -d ../openproblems-v2 ]] && echo "You need to clone the openproblems-v2 repository next to this repository" && exit 1

../openproblems-v2/target/docker/common/create_task_readme/create_task_readme \
--task "dge_perturbation_prediction" \
--task_dir "src/task" \
--github_url "https://github.com/openproblems-bio/task-dge-perturbation-prediction/tree/main/" \
--task "perturbation_prediction" \
--task_dir "src" \
--github_url "https://github.com/openproblems-bio/task_perturbation_prediction/tree/main/" \
--output "README.md"
4 changes: 2 additions & 2 deletions scripts/run_benchmark_tw.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)"
resources_dir="s3://openproblems-bio/public/neurips-2023-competition/workflow-resources"
publish_dir="s3://openproblems-data/resources/dge_perturbation_prediction/results/${RUN_ID}"
publish_dir="s3://openproblems-data/resources/perturbation_prediction/results/${RUN_ID}"

cat > /tmp/params.yaml << HERE
param_list:
Expand All @@ -20,7 +20,7 @@ output_state: "state.yaml"
publish_dir: "$publish_dir"
HERE

tw launch https://github.com/openproblems-bio/task-dge-perturbation-prediction.git \
tw launch https://github.com/openproblems-bio/task_perturbation_prediction.git \
--revision main_build \
--pull-latest \
--main-script target/nextflow/workflows/run_benchmark/main.nf \
Expand Down
6 changes: 3 additions & 3 deletions scripts/run_benchmark_tw_traens.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

RUN_ID="traens_$(date +%Y-%m-%d_%H-%M-%S)"
resources_dir="s3://openproblems-bio/public/neurips-2023-competition/workflow-resources"
publish_dir="s3://openproblems-data/resources/dge_perturbation_prediction/results/${RUN_ID}"
publish_dir="s3://openproblems-data/resources/perturbation_prediction/results/${RUN_ID}"

cat > /tmp/params.yaml << HERE
param_list:
Expand All @@ -18,8 +18,8 @@ output_state: "state.yaml"
publish_dir: "$publish_dir"
HERE

tw launch https://github.com/openproblems-bio/task-dge-perturbation-prediction.git \
--revision fix_trafo_ens_build \
tw launch https://github.com/openproblems-bio/task_perturbation_prediction.git \
--revision suggestions_elior_build \
--pull-latest \
--main-script target/nextflow/workflows/run_benchmark/main.nf \
--workspace 53907369739130 \
Expand Down
4 changes: 2 additions & 2 deletions scripts/run_layert_tw.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash

RUN_ID="layert_$(date +%Y-%m-%d_%H-%M-%S)"
publish_dir="s3://openproblems-data/resources/dge_perturbation_prediction/results/${RUN_ID}"
publish_dir="s3://openproblems-data/resources/perturbation_prediction/results/${RUN_ID}"

cat > /tmp/params.yaml << HERE
id: dge_perturbation_task
Expand All @@ -12,7 +12,7 @@ rename_keys: "de_train_h5ad:de_train_h5ad,de_test_h5ad:de_test_h5ad,id_map:id_ma
settings: '{"layer": "t"}'
HERE

tw launch https://github.com/openproblems-bio/task-dge-perturbation-prediction.git \
tw launch https://github.com/openproblems-bio/task_perturbation_prediction.git \
--revision main_build \
--pull-latest \
--main-script target/nextflow/workflows/run_benchmark/main.nf \
Expand Down
4 changes: 2 additions & 2 deletions scripts/run_stability_tw.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash

RUN_ID="stability_$(date +%Y-%m-%d_%H-%M-%S)"
publish_dir="s3://openproblems-data/resources/dge_perturbation_prediction/results/${RUN_ID}"
publish_dir="s3://openproblems-data/resources/perturbation_prediction/results/${RUN_ID}"

cat > /tmp/params.yaml << HERE
id: neurips-2023-data
Expand All @@ -11,7 +11,7 @@ output_state: "state.yaml"
publish_dir: "$publish_dir"
HERE

tw launch https://github.com/openproblems-bio/task-dge-perturbation-prediction.git \
tw launch https://github.com/openproblems-bio/task_perturbation_prediction.git \
--revision main_build \
--pull-latest \
--main-script target/nextflow/workflows/run_stability_analysis/main.nf \
Expand Down
8 changes: 4 additions & 4 deletions scripts/sync_results.sh
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
#!/bin/bash

aws s3 sync \
s3://openproblems-data/resources/dge_perturbation_prediction/results/ \
s3://openproblems-data/resources/perturbation_prediction/results/ \
output/benchmark_results/ \
--delete --dryrun

# sync back modified results
aws s3 sync \
output/benchmark_results/ \
s3://openproblems-data/resources/dge_perturbation_prediction/results/ \
s3://openproblems-data/resources/perturbation_prediction/results/ \
--delete --dryrun

# sync one run
runid=run_2024-06-01_00-03-09; aws s3 sync \
output/benchmark_results/${runid}/ \
s3://openproblems-data/resources/dge_perturbation_prediction/results/${runid}/ \
--delete --dryrun
s3://openproblems-data/resources/perturbation_prediction/results/${runid}/ \
--delete --dryrun
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ functionality:
required: true
direction: output
- name: "--output_model"
type: "file"
description: "Optional model output. If no value is passed, the model will be removed at the end of the run."
__merge__: file_model.yaml
direction: output
required: false
must_exist: false
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
functionality:
namespace: process_dataset
info:
type: process_dataset
type_info:
Expand All @@ -25,5 +26,4 @@ functionality:
__merge__: file_id_map.yaml
required: true
direction: output
default: id_map.csv
test_resources: []
default: id_map.csv
File renamed without changes.
File renamed without changes.
File renamed without changes.
6 changes: 6 additions & 0 deletions src/api/file_model.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
type: file
example: resources/neurips-2023-data/model/
info:
label: Model
summary: "Optional model output. If no value is passed, the model will be removed at the end of the run."
file_type: directory
File renamed without changes.
File renamed without changes.
File renamed without changes.
48 changes: 44 additions & 4 deletions src/task/api/task_info.yaml → src/api/task_info.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: dge_perturbation_prediction
label: DGE Perturbation Prediction
name: perturbation_prediction
label: Perturbation Prediction
summary: Predicting how small molecules change gene expression in different cell types.
readme: |
## Installation
Expand All @@ -17,9 +17,9 @@ readme: |
To get started, you can run the following commands:
```bash
git clone git@github.com:openproblems-bio/task-dge-perturbation-prediction.git
git clone git@github.com:openproblems-bio/task_perturbation_prediction.git
cd task-dge-perturbation-prediction
cd task_perturbation_prediction
# download resources
scripts/download_resources.sh
Expand Down Expand Up @@ -99,3 +99,43 @@ authors:
info:
github: rcannood
orcid: "0000-0003-3641-729X"
- name: Daniel Burkhardt
roles: [ author ]
info:
github: dburkhardt
orcid: 0000-0001-7744-1363
- name: Malte D. Luecken
roles: [ author ]
info:
github: LuckyMD
orcid: 0000-0001-7464-7921
- name: Tin M. Tunjic
roles: [ contributor ]
info:
github: ttunja
orcid: 0000-0001-8842-6548
- name: Mengbo Wang
roles: [ contributor ]
info:
github: wangmengbo
orcid: 0000-0002-0266-9993
- name: Andrew Benz
roles: [ author ]
info:
github: andrew-benz
orcid: 0009-0002-8118-1861
- name: Tianyu Liu
roles: [ contributor ]
info:
github: HelloWorldLTY
orcid: 0000-0002-9412-6573
- name: Jalil Nourisa
roles: [ contributor ]
info:
github: janursa
orcid: 0000-0002-7539-4396
- name: Rico Meinl
roles: [ contributor ]
info:
github: ricomnl
orcid: 0000-0003-4356-6058
4 changes: 2 additions & 2 deletions src/common/create_component/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ functionality:
direction: output
# required: true
description: Path to the component directory. Suggested location is `src/<TASK>/<TYPE>s/<NAME>`.
default: src/task/methods/${VIASH_PAR_NAME}
default: src/methods/${VIASH_PAR_NAME}
- type: file
name: --api_file
description: |
Expand All @@ -33,7 +33,7 @@ functionality:
to manually specify a different API file to inherit from.
must_exist: false
# required: true
default: src/task/api/comp_method.yaml
default: src/api/comp_method.yaml
- type: file
name: --viash_yaml
description: |
Expand Down
6 changes: 3 additions & 3 deletions src/common/create_component/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@

## VIASH START
par = {
"task": "DGE Perturbation Prediction",
"task": "Perturbation Prediction",
"type": "method",
"language": "python",
"name": "new_comp",
"output": "src/task/method/new_comp",
"api_file": "src/task/api/comp_method.yaml",
"output": "src/method/new_comp",
"api_file": "src/api/comp_method.yaml",
"viash_yaml": "_viash.yaml"
}
## VIASH END
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ functionality:
- type: python_script
path: script.py
- path: helper.py
- path: ../../utils/anndata_to_dataframe.py
platforms:
- type: docker
image: ghcr.io/openproblems-bio/base_pytorch_nvidia:1.0.4
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -20,18 +20,16 @@
"submission_names": ["dl40"]
}
meta = {
"resources_dir": "src/task/methods/jn_ap_op2",
"resources_dir": "src/methods/jn_ap_op2",
}
## VIASH END

sys.path.append(meta["resources_dir"])

from anndata_to_dataframe import anndata_to_dataframe
from helper import plant_seed, MultiOutputTargetEncoder, train

print('Reading input files', flush=True)
de_train_h5ad = ad.read_h5ad(par["de_train_h5ad"])
de_train = anndata_to_dataframe(de_train_h5ad, par["layer"])
id_map = pd.read_csv(par["id_map"])

gene_names = list(de_train_h5ad.var_names)
Expand All @@ -58,10 +56,10 @@

print('Data location', flush=True)
# Data location
cell_types = de_train['cell_type']
sm_names = de_train['sm_name']
cell_types = de_train_h5ad.obs['cell_type'].astype(str)
sm_names = de_train_h5ad.obs['sm_name'].astype(str)

data = de_train.drop(columns=["cell_type", "sm_name", "sm_lincs_id", "SMILES", "split", "control"]).to_numpy(dtype=float)
data = de_train_h5ad.layers[par["layer"]]

print('Train model', flush=True)
# ... train model ...
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"output_model": None
}
meta = {
"resources_dir": "src/task/methods/lgc_ensemble",
"resources_dir": "src/methods/lgc_ensemble",
"temp_dir": "/tmp"
}
## VIASH END
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 8a61381

Please sign in to comment.