Skip to content

rrchai/nf-synapse-challenge

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nf-synapse-challenge

A general purpose Nextflow workflow for evaluating submissions to challenges hosted on Synapse.

Overview

This repository is structured so that each challenge type has its own subworkflow which is wrapped by a uniquely named workflow in main.nf. This allows users to invoke the workflow appropriate for their challenge by using the entry parameter locally:

nextflow run main.nf -entry {subworkflow_name} -profile local

or on Nextflow Tower by using the Workflow entry name field under Advanced options.

Setup

This workflow expects a secret called SYNAPSE_AUTH_TOKEN (a Synapse Authentication Token). This secret should be configured in your local installation of Nextflow for local runs, or as a workspace secret in your Nextflow Tower workspace. Ensure that the token you use has access to any Synapse views and folders that you intend to use as inputs to the workflow.

Note: All default parameter values for Synapse project or objects (view_id and input_id) currently point to a Synapse project that only DPE team members have access to. Unless you have access to the DPE-Testing Synapse project, you will not be able to test this workflow with the default values using your SYNAPSE_AUTH_TOKEN.

Supported Challenge Types

Model-to-Data Challenges

Prerequisites

In order to use this workflow, you must already have completed the following steps:

  1. Created a Synapse project shared with challenge participants.
  2. Created an evaluation queue within the Synapse project.
  3. One or more Docker images have already been submitted to your evaluation queue.
  4. Created a submission view that includes the id and status columns.
  5. Added the input data for evaluating submissions to a folder within your Synapse project.

Running the workflow

The workflow takes the following inputs:

  1. project_name (required & case-sensitive): The name of your Project the Challenge is running in. Please replace placeholder value.
  2. view_id (required): The Synapse ID for your submission view. Please replace placeholder value.
  3. input_id (required): The Synapse ID for the folder holding the testing data for submissions. Please replace placeholder value.
  4. email_with_score (optional & case-sensitive): Choose whether or not the e-mail sent out to participants will include the evaluation score or not. Can either be "yes" or "no". Defaults to "yes".
  5. cpus (optional): Number of CPUs to dedicate to the RUN_DOCKER process i.e. the challenge executions. Defaults to 4
  6. memory (optional): Amount of memory to dedicate to the RUN_DOCKER process i.e. the challenge executions. Defaults to 16.GB
  7. scoring_script (optional): The string name of the scoring script to use for the SCORE step of the workflow run. Defaults to model_to_data_score.py
  8. validation_script (optional): The string name of the validation script to use for the VALIDATE step of the workflow run. Defaults to validate.py

Run the workflow locally with default inputs:

nextflow run main.nf -entry MODEL_TO_DATA_CHALLENGE -profile local

Workflow DAG

flowchart LR;
    A[SYNAPSE STAGE]-->F;
    B[GET SUBMISSIONS]-->C([NEW SUBMISSIONS?]);
    C-->|YES|D[CREATE FOLDERS];
		C-->|YES|E[UPDATE STATUS];
    C-->|NO|END;
    D-->F[RUN DOCKER];
    E-->F[RUN DOCKER];
		F-->G[UPDATE FOLDERS];
		F-->H[UPDATE STATUS];
		G-->I[VALIDATE];
		H-->I[VALIDATE];
		I-->J[ANNOTATE];
		I-->K[UPDATE STATUS];
		J-->L[SCORE];
		K-->L[SCORE];
		L-->M[ANNOTATE];
		L-->N[UPDATE STATUS];
		M-->O[SEND EMAIL];
		N-->O[SEND EMAIL];
		O-->END;
Loading

Data-to-Model Challenges

Prerequisites

In order to use this workflow, you must already have completed the following steps:

  1. Created a Synapse project shared with challenge participants.
  2. Created an evaluation queue within the Synapse project.
  3. One or more data files have already been submitted to your evaluation queue.
  4. Created a submission view that includes the id and status columns.

Running the workflow

The workflow requires the following inputs:

  1. view_id (required): The Synapse ID for your submission view.
  2. scoring_script (required): The string name of the scoring script to use for the SCORE step of the workflow run. Defaults to data_to_model_score.py
  3. validation_script (required): The string name of the validation script to use for the VALIDATE step of the workflow run. Defaults to validate.py
  4. testing_data (required): The Synapse ID for the folder holding the testing data for submissions.

Run the workflow locally with default inputs:

nextflow run main.nf -entry DATA_TO_MODEL_CHALLENGE -profile local

Workflow DAG

  flowchart LR;
    A[SYNAPSE STAGE]-->I[SCORE];
    B[GET SUBMISSIONS]-->C([NEW SUBMISSIONS?]);
    C-->|YES|D[UPDATE STATUS];
    C-->|NO|END;
    D-->E[DOWNLOAD SUBMISSIONS];
    E-->F[VALIDATE];
    F-->G[UPDATE STATUS];
    F-->H[ANNOTATE];
    G-->I[SCORE];
    H-->I;
    I-->J[UPDATE STATUS];
    I-->K[ANNOTATE];
    J-->END;
Loading

Profiles

The workflow includes two preconfigured profiles for memory and CPU allocation for the RUN_DOCKER step of Model-to-Data challenges. The local profile includes cpus = 4 and memory = 16.GB. This is intended to be used for runs on local machines with adequate resources. The tower profile dedicates double the resources (cpus = 8; memory = 32.GB) and can be used when running the workflow on Nextflow Tower for improved performance.

Adding Support for New Challenge Types

Adding a New Subworkflow

If you would like to add support for a new challenge type, you can do so by creating a new subworkflow in the subworkflows directory. Name your subworkflow clearly with the name of the new challenge type. You should try to use the existing library of modules to build your subworkflow. It is important to not change the logic of existing modules to avoid breaking other subworkflows. Rather, you should add new process definitions to the modules folder and give them clear names that indicate their purpose. Once you have created your subworkflow, you can add it to the main.nf file and test it using:

nextflow run main.nf -entry {your_new_subworkflow_name}

Adding New Scoring and Validation Scripts

If you would like to contribute new scoring or validation scripts, you can do so by giving your scripts a descriptive name such as my_challenge_scoring.py and adding them to the bin directory. You can then pass the name of your new scripts to the workflow using the scoring_script and validation_script parameters. The workflow will then use them in the appropriate steps.

About

Flexible framework for challenges run on Synapse

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 75.1%
  • Nextflow 23.7%
  • Dockerfile 1.2%