A general purpose Nextflow workflow for evaluating submissions to challenges hosted on Synapse.
This repository is structured so that each challenge type has its own subworkflow which is wrapped by a uniquely named workflow in main.nf
. This allows users to invoke the workflow appropriate for their challenge by using the entry
parameter locally:
nextflow run main.nf -entry {subworkflow_name} -profile local
or on Nextflow Tower by using the Workflow entry name
field under Advanced options
.
This workflow expects a secret called SYNAPSE_AUTH_TOKEN
(a Synapse Authentication Token). This secret should be configured in your local installation of Nextflow for local runs, or as a workspace secret in your Nextflow Tower workspace. Ensure that the token you use has access to any Synapse views and folders that you intend to use as inputs to the workflow.
Note: All default parameter values for Synapse project or objects (view_id
and input_id
) currently point to a Synapse project that only DPE team members have access to. Unless you have access to the DPE-Testing
Synapse project, you will not be able to test this workflow with the default values using your SYNAPSE_AUTH_TOKEN
.
In order to use this workflow, you must already have completed the following steps:
- Created a Synapse project shared with challenge participants.
- Created an evaluation queue within the Synapse project.
- One or more Docker images have already been submitted to your evaluation queue.
- Created a submission view that includes the
id
andstatus
columns. - Added the input data for evaluating submissions to a folder within your Synapse project.
The workflow takes the following inputs:
project_name
(required & case-sensitive): The name of your Project the Challenge is running in. Please replace placeholder value.view_id
(required): The Synapse ID for your submission view. Please replace placeholder value.input_id
(required): The Synapse ID for the folder holding the testing data for submissions. Please replace placeholder value.email_with_score
(optional & case-sensitive): Choose whether or not the e-mail sent out to participants will include the evaluation score or not. Can either be "yes" or "no". Defaults to "yes".cpus
(optional): Number of CPUs to dedicate to theRUN_DOCKER
process i.e. the challenge executions. Defaults to4
memory
(optional): Amount of memory to dedicate to theRUN_DOCKER
process i.e. the challenge executions. Defaults to16.GB
scoring_script
(optional): The string name of the scoring script to use for theSCORE
step of the workflow run. Defaults tomodel_to_data_score.py
validation_script
(optional): The string name of the validation script to use for theVALIDATE
step of the workflow run. Defaults tovalidate.py
Run the workflow locally with default inputs:
nextflow run main.nf -entry MODEL_TO_DATA_CHALLENGE -profile local
flowchart LR;
A[SYNAPSE STAGE]-->F;
B[GET SUBMISSIONS]-->C([NEW SUBMISSIONS?]);
C-->|YES|D[CREATE FOLDERS];
C-->|YES|E[UPDATE STATUS];
C-->|NO|END;
D-->F[RUN DOCKER];
E-->F[RUN DOCKER];
F-->G[UPDATE FOLDERS];
F-->H[UPDATE STATUS];
G-->I[VALIDATE];
H-->I[VALIDATE];
I-->J[ANNOTATE];
I-->K[UPDATE STATUS];
J-->L[SCORE];
K-->L[SCORE];
L-->M[ANNOTATE];
L-->N[UPDATE STATUS];
M-->O[SEND EMAIL];
N-->O[SEND EMAIL];
O-->END;
In order to use this workflow, you must already have completed the following steps:
- Created a Synapse project shared with challenge participants.
- Created an evaluation queue within the Synapse project.
- One or more data files have already been submitted to your evaluation queue.
- Created a submission view that includes the
id
andstatus
columns.
The workflow requires the following inputs:
view_id
(required): The Synapse ID for your submission view.scoring_script
(required): The string name of the scoring script to use for theSCORE
step of the workflow run. Defaults todata_to_model_score.py
validation_script
(required): The string name of the validation script to use for theVALIDATE
step of the workflow run. Defaults tovalidate.py
testing_data
(required): The Synapse ID for the folder holding the testing data for submissions.
Run the workflow locally with default inputs:
nextflow run main.nf -entry DATA_TO_MODEL_CHALLENGE -profile local
flowchart LR;
A[SYNAPSE STAGE]-->I[SCORE];
B[GET SUBMISSIONS]-->C([NEW SUBMISSIONS?]);
C-->|YES|D[UPDATE STATUS];
C-->|NO|END;
D-->E[DOWNLOAD SUBMISSIONS];
E-->F[VALIDATE];
F-->G[UPDATE STATUS];
F-->H[ANNOTATE];
G-->I[SCORE];
H-->I;
I-->J[UPDATE STATUS];
I-->K[ANNOTATE];
J-->END;
The workflow includes two preconfigured profiles
for memory and CPU allocation for the RUN_DOCKER
step of Model-to-Data challenges. The local
profile includes cpus
= 4
and memory
= 16.GB
. This is intended to be used for runs on local machines with adequate resources. The tower
profile dedicates double the resources (cpus
= 8
; memory
= 32.GB
) and can be used when running the workflow on Nextflow Tower for improved performance.
If you would like to add support for a new challenge type, you can do so by creating a new subworkflow in the subworkflows
directory. Name your subworkflow clearly with the name of the new challenge type. You should try to use the existing library of modules to build your subworkflow. It is important to not change the logic of existing modules to avoid breaking other subworkflows. Rather, you should add new process definitions to the modules
folder and give them clear names that indicate their purpose. Once you have created your subworkflow, you can add it to the main.nf
file and test it using:
nextflow run main.nf -entry {your_new_subworkflow_name}
If you would like to contribute new scoring or validation scripts, you can do so by giving your scripts a descriptive name such as my_challenge_scoring.py
and adding them to the bin
directory. You can then pass the name of your new scripts to the workflow using the scoring_script
and validation_script
parameters. The workflow will then use them in the appropriate steps.