-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Update dataflow episode #24
base: gh-pages
Are you sure you want to change the base?
Conversation
Deploy preview for agitated-wright-f1741d ready! Built with commit a38b8f9 https://deploy-preview-24--agitated-wright-f1741d.netlify.app |
update tool/wf definition in dataflow episode
|
||
|
||
|
||
CWL Workflows are about chaining different steps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest default instead to the definition given in our CWL documentation/user guide: "CWL is a way to describe command line tools and connect them together to create workflows"
|
||
CWL Workflows are about chaining different steps. | ||
|
||
A workflow is the orchestration of the individual steps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we talking about workflows in general, or CWL workflows. If we are talking about workflows in general, I would put this statement first. If we are talking about CWL workflows, I think this is a good opportunity to explain what I CWL step is and what they can contain - and talk about connecting tools together. This definitely might be too formal "A workflow is a process characterized by multiple subprocess steps, where step outputs are connected to the inputs of downstream steps to form a directed acylic graph, and independent steps may run concurrently." But something that mentions steps are subprocesses and the inputs are connected to downstream steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are talking about workflows in general --- I think the description here is quite good and maybe we could model our description after it: https://direct.mit.edu/dint/article/2/1-2/108/10003/FAIR-Computational-Workflows
"Generally speaking, a workflow is a precise description of a procedure – a multi-step process to coordinate multiple tasks and their data dependencies. In computational workflows each task represents the execution of a computational process, such as: running a code, the invocation of a service, the calling of a command line tool, access to a database, submission of a job to a compute cloud, or the execution of data processing script or workflow."
inputs: | ||
InputFile: | ||
type: File | ||
format: edam:format_1929 # FASTA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first time seeing a workflow, I am not sure we need to have the format statements, etc. I , personally, think simpler is better when people are seeing a workflow for the first time. Since this is focused on workflow thinking, we want to them to understand the concept of the workflow and how the pieces/data connect. I know this is best practices, but it might be better served elsewhere.
prefix: "-p" | ||
valueFrom: $(self + ".bwt") | ||
|
||
#Optional arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think including optional arguments are useful when people already know what a workflow is, but having a simpler flow to grok might be easier.
inputBinding: | ||
position: 200 | ||
|
||
IndexName: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why the IndexName needs to be created for a simple example, couldn't it just be the string itself?
type: File | ||
format: edam:format_1929 # FASTA | ||
inputBinding: | ||
position: 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be simpler for people to understand if we use position 0 and position 1 instead of position 200.
PR can be closed: episode doesn't exist anymore, is part of current episodes 2 and 3 |
I'll leave this open for now - as we might want to take some of the exercise material from this branch for use elsewhere. |
This is a WIP Pull request, based on the work down during the ELIXIR CWL BioHackathon.
Initial work by @hmenager, @fpsom and @longr