Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GH-1115] Update directory docs for wfl batching #292

Merged
merged 4 commits into from
Jan 7, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 28 additions & 11 deletions docs/md/directory-usage.md → docs/md/modules-directory-usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,19 @@ for the specific pipeline for more information).
If you have a set of files uploaded to a GCS bucket and you'd like to start
a workflow for each one, you can do that via shell scripting.

!!! warning "Note"
You will likely run into performance issues with WFL if you try to start
hundreds or thousands of workflows in a single request to WFL. You'll need
to split up the workflows into multiple workloads, tips for that are
[here](#other-notes).
??? question "Wondering about performance?"
rexwangcc marked this conversation as resolved.
Show resolved Hide resolved
WFL has been updated to use Cromwell's `/api/workflows/{version}/batch`
endpoint to submit your samples in a batch, so you don't have to worry
about submitting too many samples in a single workload will break WFL.

Suppose we have a set of CRAMs in a folder in some bucket, and we'd like to
submit them all to WFL for ExternalExomeReprocessing (perhaps associated with
submit them all to WFL for `ExternalExomeReprocessing` (perhaps associated with
some project or ticket, maybe PO-1234). We'll write a short bash script that
will handle this for us.

> Make sure you're able to list the files yourself! You'll need permissions
> and you may need to run `gcloud auth login`
!!! tip
Make sure you're able to list the files yourself! You'll need permissions
and you may need to run `gcloud auth login`

## Step 1: List Files

Expand Down Expand Up @@ -71,12 +71,12 @@ REQUEST=$(jq '{
}' <<< "$ITEMS")
```

!!! info
!!! info
Remember to change the `output` bucket! And the `project` isn't used by WFL
but we keep track of it to help you organize workloads based on tickets
or anything else.

!!! info
!!! info
You can make other customizations here too, like specifying some input or
option across all the workflows by adding a `common` block. See the docs
for your pipeline or the [workflow options page](../workflow-options/) for
Expand All @@ -91,6 +91,22 @@ curl -X POST 'https://dev-wfl.gotc-dev.broadinstitute.org/api/v1/exec' \
-d "$REQUEST"
```

!!! warning
Curl will complain if the `$REQUEST` here contains more than thousand
lines of data. Remember to dump the payload to a file such as
`payload.json` and let Curl read from that file instead in that case.
For example, the last step can be replaced by:

```bash

echo "$REQUEST" >> "payload.json"

curl -X POST 'https://dev-wfl.gotc-dev.broadinstitute.org/api/v1/exec' \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H 'Content-Type: application/json' \
-d "@payload.json"
```

With this, the final result is something like the following:

```bash
Expand Down Expand Up @@ -118,11 +134,12 @@ Save that as `script.sh` and run with `bash myscript.sh` and you should be good
to go!

## Other Notes

Have a lot of workflows to submit? You can use array slicing to help split
things up:

```bash
FILES=$(jq -sR 'split("\n") | map(select(startswith("gs://")))[0:50]' <<< "$CRAMS")
FILES=$(jq -sR 'split("\n") | map(select(startswith("gs://")))[0:5000]' <<< "$CRAMS")
```

Need to select files matching some other query too? You can chain the
Expand Down
2 changes: 1 addition & 1 deletion docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ nav:
- Arrays: modules-arrays.md
- Whole Genome: modules-wgs.md
- External Exome: modules-external-exome-reprocessing.md
- Usage Across a Directory: directory-usage.md
- Usage Across a Directory: modules-directory-usage.md
- Readings:
- WorkFlow Launcher's Role in Terra: terra.md

Expand Down