Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Update dataflow episode #24

Draft
wants to merge 6 commits into
base: gh-pages
Choose a base branch
from
Draft

WIP: Update dataflow episode #24

wants to merge 6 commits into from

Conversation

fpsom
Copy link
Collaborator

@fpsom fpsom commented Mar 22, 2021

This is a WIP Pull request, based on the work down during the ELIXIR CWL BioHackathon.

Initial work by @hmenager, @fpsom and @longr

@mr-c mr-c marked this pull request as draft March 22, 2021 15:38
@netlify
Copy link

netlify bot commented Mar 23, 2021

Deploy preview for agitated-wright-f1741d ready!

Built with commit a38b8f9

https://deploy-preview-24--agitated-wright-f1741d.netlify.app




CWL Workflows are about chaining different steps.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest default instead to the definition given in our CWL documentation/user guide: "CWL is a way to describe command line tools and connect them together to create workflows"


CWL Workflows are about chaining different steps.

A workflow is the orchestration of the individual steps.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we talking about workflows in general, or CWL workflows. If we are talking about workflows in general, I would put this statement first. If we are talking about CWL workflows, I think this is a good opportunity to explain what I CWL step is and what they can contain - and talk about connecting tools together. This definitely might be too formal "A workflow is a process characterized by multiple subprocess steps, where step outputs are connected to the inputs of downstream steps to form a directed acylic graph, and independent steps may run concurrently." But something that mentions steps are subprocesses and the inputs are connected to downstream steps.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are talking about workflows in general --- I think the description here is quite good and maybe we could model our description after it: https://direct.mit.edu/dint/article/2/1-2/108/10003/FAIR-Computational-Workflows

"Generally speaking, a workflow is a precise description of a procedure – a multi-step process to coordinate multiple tasks and their data dependencies. In computational workflows each task represents the execution of a computational process, such as: running a code, the invocation of a service, the calling of a command line tool, access to a database, submission of a job to a compute cloud, or the execution of data processing script or workflow."

inputs:
InputFile:
type: File
format: edam:format_1929 # FASTA

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first time seeing a workflow, I am not sure we need to have the format statements, etc. I , personally, think simpler is better when people are seeing a workflow for the first time. Since this is focused on workflow thinking, we want to them to understand the concept of the workflow and how the pieces/data connect. I know this is best practices, but it might be better served elsewhere.

prefix: "-p"
valueFrom: $(self + ".bwt")

#Optional arguments

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think including optional arguments are useful when people already know what a workflow is, but having a simpler flow to grok might be easier.

inputBinding:
position: 200

IndexName:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why the IndexName needs to be created for a simple example, couldn't it just be the string itself?

type: File
format: edam:format_1929 # FASTA
inputBinding:
position: 200

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be simpler for people to understand if we use position 0 and position 1 instead of position 200.

@ALuesink
Copy link
Contributor

ALuesink commented Dec 3, 2021

PR can be closed: episode doesn't exist anymore, is part of current episodes 2 and 3

@douglowe douglowe closed this Mar 14, 2022
@douglowe douglowe reopened this Mar 14, 2022
@douglowe
Copy link
Collaborator

I'll leave this open for now - as we might want to take some of the exercise material from this branch for use elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants