Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename task-dge-perturbation-prediction to task_perturbation_prediction #66

Merged
merged 8 commits into from
Jun 4, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
move files
  • Loading branch information
rcannood committed Jun 4, 2024
commit aee68bf0cdea5315f79f97339567889360e2a7c9
8 changes: 4 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

Initial release of the Perturbation Prediction task. Initial components:

* `src/task/process_dataset`: Compute the DGE data from the raw single-cell counts using Limma.
* `src/task/control_methods`: Baseline control methods: sample, ground_truth, zeros, mean_across_celltypes, mean_across_compounds, mean_outcome.
* `src/task/methods`: Perturbation prediction methods: jn_ap_op2, lgc_ensemble, nn_retraining_with_pseudolabels, pyboost, scape, transformer_ensemble.
* `src/task/metrics`: Evaluation metrics: mean_rowwise_error, mean_rowwise_correlation.
* `src/process_dataset`: Compute the DGE data from the raw single-cell counts using Limma.
* `src/control_methods`: Baseline control methods: sample, ground_truth, zeros, mean_across_celltypes, mean_across_compounds, mean_outcome.
* `src/methods`: Perturbation prediction methods: jn_ap_op2, lgc_ensemble, nn_retraining_with_pseudolabels, pyboost, scape, transformer_ensemble.
* `src/metrics`: Evaluation metrics: mean_rowwise_error, mean_rowwise_correlation.


12 changes: 6 additions & 6 deletions scripts/add_a_method.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,25 @@ viash run src/common/create_component/config.vsh.yaml -- \
--language "$method_lang" \
--name "$method_id"

# TODO: fill in required fields in src/task/methods/foo/config.vsh.yaml
# TODO: edit src/task/methods/foo/script.py/R
# TODO: fill in required fields in src/methods/foo/config.vsh.yaml
# TODO: edit src/methods/foo/script.py/R

# test the component
viash test src/task/methods/$method_id/config.vsh.yaml
viash test src/methods/$method_id/config.vsh.yaml

# rebuild the container (only if you change something to the docker platform)
# You can reduce the memory and cpu allotted to jobs in _viash.yaml by modifying .platforms[.type == "nextflow"].config.labels
viash run src/task/methods/$method_id/config.vsh.yaml -- \
viash run src/methods/$method_id/config.vsh.yaml -- \
---setup cachedbuild ---verbose

# run the method (using h5ad as input)
viash run src/task/methods/$method_id/config.vsh.yaml -- \
viash run src/methods/$method_id/config.vsh.yaml -- \
--de_train_h5ad "resources/neurips-2023-kaggle/2023-09-12_de_by_cell_type_train.h5ad" \
--id_map "resources/neurips-2023-kaggle/id_map.csv" \
--output "output/prediction.h5ad"

# run evaluation metric
viash run src/task/metrics/mean_rowwise_error/config.vsh.yaml -- \
viash run src/metrics/mean_rowwise_error/config.vsh.yaml -- \
--de_test_h5ad "resources/neurips-2023-kaggle/de_test.h5ad" \
--prediction "output/prediction.h5ad" \
--output "output/score.h5ad"
Expand Down
6 changes: 3 additions & 3 deletions scripts/generate_kaggle_resources.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ if [[ ! -f "$OUT/2023-09-12_de_by_cell_type_test.h5ad" ]]; then
"import anndata as ad; ad.read_h5ad('$OUT/2023-09-12_de_by_cell_type_train.h5ad').write_h5ad('$OUT/2023-09-12_de_by_cell_type_train.h5ad', compression='gzip')"
fi

viash run src/task/process_dataset/convert_kaggle_h5ad_to_parquet/config.vsh.yaml -- \
viash run src/process_dataset/convert_kaggle_h5ad_to_parquet/config.vsh.yaml -- \
--input_train "$OUT/2023-09-12_de_by_cell_type_train.h5ad" \
--input_test "$OUT/2023-09-12_de_by_cell_type_test.h5ad" \
--input_single_cell_h5ad "resources/neurips-2023-raw/sc_counts.h5ad" \
Expand All @@ -34,14 +34,14 @@ viash run src/task/process_dataset/convert_kaggle_h5ad_to_parquet/config.vsh.yam
--dataset_organism homo_sapiens

echo ">> Run method"
viash run src/task/control_methods/mean_across_compounds/config.vsh.yaml -- \
viash run src/control_methods/mean_across_compounds/config.vsh.yaml -- \
--de_train_h5ad "$OUT/de_train.h5ad" \
--de_test_h5ad "$OUT/de_test.h5ad" \
--id_map "$OUT/id_map.csv" \
--output "$OUT/prediction.h5ad"

echo ">> Run metric"
viash run src/task/metrics/mean_rowwise_error/config.vsh.yaml -- \
viash run src/metrics/mean_rowwise_error/config.vsh.yaml -- \
--prediction "$OUT/prediction.h5ad" \
--de_test_h5ad "$OUT/de_test.h5ad" \
--output "$OUT/score.h5ad"
Expand Down
4 changes: 2 additions & 2 deletions scripts/generate_resources.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,14 @@ nextflow run \
--publish_dir "$OUT"

echo ">> Run method"
viash run src/task/control_methods/mean_across_compounds/config.vsh.yaml -- \
viash run src/control_methods/mean_across_compounds/config.vsh.yaml -- \
--de_train_h5ad "$OUT/de_train.h5ad" \
--de_test_h5ad "$OUT/de_test.h5ad" \
--id_map "$OUT/id_map.csv" \
--output "$OUT/prediction.h5ad"

echo ">> Run metric"
viash run src/task/metrics/mean_rowwise_error/config.vsh.yaml -- \
viash run src/metrics/mean_rowwise_error/config.vsh.yaml -- \
--prediction "$OUT/prediction.h5ad" \
--de_test_h5ad "$OUT/de_test.h5ad" \
--output "$OUT/score.h5ad"
Expand Down
4 changes: 2 additions & 2 deletions scripts/render_readme.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ set -e
[[ ! -d ../openproblems-v2 ]] && echo "You need to clone the openproblems-v2 repository next to this repository" && exit 1

../openproblems-v2/target/docker/common/create_task_readme/create_task_readme \
--task "dge_perturbation_prediction" \
--task_dir "src/task" \
--task "perturbation_prediction" \
--task_dir "src" \
--github_url "https://github.com/openproblems-bio/task_perturbation_prediction/tree/main/" \
--output "README.md"
2 changes: 1 addition & 1 deletion scripts/run_benchmark_tw.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)"
resources_dir="s3://openproblems-bio/public/neurips-2023-competition/workflow-resources"
publish_dir="s3://openproblems-data/resources/dge_perturbation_prediction/results/${RUN_ID}"
publish_dir="s3://openproblems-data/resources/perturbation_prediction/results/${RUN_ID}"

cat > /tmp/params.yaml << HERE
param_list:
Expand Down
2 changes: 1 addition & 1 deletion scripts/run_benchmark_tw_traens.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

RUN_ID="traens_$(date +%Y-%m-%d_%H-%M-%S)"
resources_dir="s3://openproblems-bio/public/neurips-2023-competition/workflow-resources"
publish_dir="s3://openproblems-data/resources/dge_perturbation_prediction/results/${RUN_ID}"
publish_dir="s3://openproblems-data/resources/perturbation_prediction/results/${RUN_ID}"

cat > /tmp/params.yaml << HERE
param_list:
Expand Down
2 changes: 1 addition & 1 deletion scripts/run_layert_tw.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash

RUN_ID="layert_$(date +%Y-%m-%d_%H-%M-%S)"
publish_dir="s3://openproblems-data/resources/dge_perturbation_prediction/results/${RUN_ID}"
publish_dir="s3://openproblems-data/resources/perturbation_prediction/results/${RUN_ID}"

cat > /tmp/params.yaml << HERE
id: dge_perturbation_task
Expand Down
2 changes: 1 addition & 1 deletion scripts/run_stability_tw.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash

RUN_ID="stability_$(date +%Y-%m-%d_%H-%M-%S)"
publish_dir="s3://openproblems-data/resources/dge_perturbation_prediction/results/${RUN_ID}"
publish_dir="s3://openproblems-data/resources/perturbation_prediction/results/${RUN_ID}"

cat > /tmp/params.yaml << HERE
id: neurips-2023-data
Expand Down
6 changes: 3 additions & 3 deletions scripts/sync_results.sh
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
#!/bin/bash

aws s3 sync \
s3://openproblems-data/resources/dge_perturbation_prediction/results/ \
s3://openproblems-data/resources/perturbation_prediction/results/ \
output/benchmark_results/ \
--delete --dryrun

# sync back modified results
aws s3 sync \
output/benchmark_results/ \
s3://openproblems-data/resources/dge_perturbation_prediction/results/ \
s3://openproblems-data/resources/perturbation_prediction/results/ \
--delete --dryrun

# sync one run
runid=run_2024-06-01_00-03-09; aws s3 sync \
output/benchmark_results/${runid}/ \
s3://openproblems-data/resources/dge_perturbation_prediction/results/${runid}/ \
s3://openproblems-data/resources/perturbation_prediction/results/${runid}/ \
--delete --dryrun
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions src/common/create_component/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ functionality:
direction: output
# required: true
description: Path to the component directory. Suggested location is `src/<TASK>/<TYPE>s/<NAME>`.
default: src/task/methods/${VIASH_PAR_NAME}
default: src/methods/${VIASH_PAR_NAME}
- type: file
name: --api_file
description: |
Expand All @@ -33,7 +33,7 @@ functionality:
to manually specify a different API file to inherit from.
must_exist: false
# required: true
default: src/task/api/comp_method.yaml
default: src/api/comp_method.yaml
- type: file
name: --viash_yaml
description: |
Expand Down
4 changes: 2 additions & 2 deletions src/common/create_component/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
"type": "method",
"language": "python",
"name": "new_comp",
"output": "src/task/method/new_comp",
"api_file": "src/task/api/comp_method.yaml",
"output": "src/method/new_comp",
"api_file": "src/api/comp_method.yaml",
"viash_yaml": "_viash.yaml"
}
## VIASH END
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
"submission_names": ["dl40"]
}
meta = {
"resources_dir": "src/task/methods/jn_ap_op2",
"resources_dir": "src/methods/jn_ap_op2",
}
## VIASH END

Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"output_model": None
}
meta = {
"resources_dir": "src/task/methods/lgc_ensemble",
"resources_dir": "src/methods/lgc_ensemble",
"temp_dir": "/tmp"
}
## VIASH END
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"train_data_aug_dir": "output/train_data_aug_dir",
}
meta = {
"resources_dir": "src/task/methods/lgc_ensemble",
"resources_dir": "src/methods/lgc_ensemble",
"temp_dir": "/tmp"
}
## VIASH END
Expand All @@ -37,7 +37,7 @@


###################################################################
# interpreted from src/task/methods/lgc_ensemble/prepare_data.py
# interpreted from src/methods/lgc_ensemble/prepare_data.py
# prepare data
seed_everything()

Expand Down Expand Up @@ -91,7 +91,7 @@
_, _ = save_ChemBERTa_features(test_smiles, out_dir=par["train_data_aug_dir"], on_train_data=False)

###################################################################
# interpreted from src/task/methods/lgc_ensemble/train.py
# interpreted from src/methods/lgc_ensemble/train.py

## Prepare cross-validation
cell_types_sm_names = de_train[['cell_type', 'sm_name']]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"log_file": "output/log.json",
}
meta = {
"resources_dir": "src/task/methods/lgc_ensemble",
"resources_dir": "src/methods/lgc_ensemble",
"temp_dir": "/tmp"
}
## VIASH END
Expand All @@ -32,7 +32,7 @@
from helper_functions import train_function

###################################################################
# Interpretation from src/task/methods/lgc_ensemble/helper_functions.py
# Interpretation from src/methods/lgc_ensemble/helper_functions.py

print("Load data...", flush=True)
# read kf_cv_initial from json
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"output": "output.h5ad",
"reps": 2,
}
meta = {"resources_dir": "src/task/methods/nn_retraining_with_pseudolabels"}
meta = {"resources_dir": "src/methods/nn_retraining_with_pseudolabels"}
## VIASH END

# load helper functions in notebooks
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
output = "output.h5ad",
)
meta = dict(
resources_dir = "src/task/methods/pyboost"
resources_dir = "src/methods/pyboost"
)
## VIASH END

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"layer": "sign_log10_pval"
}
meta = {
"resources_dir": "src/task/methods/transformer_ensemble",
"resources_dir": "src/methods/transformer_ensemble",
}
## VIASH END

Expand Down