-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Review]: Workflows with Snakemake #17
Comments
Thank you for submitting this lesson for review, @gperu. My capacity for managing lesson reviews is quite limited at the moment and I will not be able to handle reviews of all of your submitted lessons simultaneously. If you have a preference for which lesson(s) you would like us to prioritise for review, please let me know and I will do my best to focus on that/those first. |
Thanks for submitting this lesson, @tbooth. It appears very well put together and the topic is a good fit for The Carpentries Lab. I'll be acting as Editor on this submission, and I have completed the editorial checks that form the first step of the review process. I have requested only two changes before I will assign reviewers. You can see my comments below. To ensure that the review process runs as smoothly as possible, please make sure you are subscribed to receive notifications from this thread. On the right sidebar of this page you should see a section headed Notifications, with a Customize link. You can click on that and make sure that you have the Subscribed option selected, to receive all notifications from the thread. You can add a badge to display the status of this review in the README of your lesson repository with the following Markdown: [![The Carpentries Lab Review Status](http://badges.carpentries-lab.org/17_status.svg)](https://github.com/carpentries-lab/reviews/issues/17) Editor Checklist - Snakemake for BioinformaticsAccessibility
Alternative text descriptions are present for almost all figures in the lesson. Please replace the alternative text for the flowchart representation in Constructing a whole new workflow with a meaningful description.
A note about headings: by my reading, the two Content
I could download the data via the link on the Setup page, but I did not find a link to the FigShare record for that download, so I cannot check the license terms. Please share a link to the FigShare record for the dataset here, and add it to the lesson somewhere too e.g. on the Setup page and/or where you describe the data in the first episode. Design
RepositoryThe lesson repository includes:
I recommend updating the README to remove the initial setup checklist from the early days of lesson development. Structure
Most episodes are estimated to take more than an hour, which is definitely at the longer end of what I would consider appropriate. However, I think this is a reflection of the long time slots that are allocated to the completion of exercises, implying that cognitive load is being managed by providing learners with plenty of time to apply the skills and concepts they are learning. This facilitates transfer from working to long-term memory, so I am satisfied with the composition of the lesson. Nevertheless, reviewers may find places in the lesson where they can suggest episodes be broken into smaller chunks. Supporting informationThe lesson includes:
|
Hi @tobyhodges thanks very much for this. I'd made a start on the JOSE sumbission stuff which I know is not a prerequisite for acceptance to the lab but is still important to do. I'll get working on the tasks you listed ASAP. I also have some thoughts on people who could be approached as reviewers, if that would help. |
I have a few potential reviewers in mind but would be delighted to get additional suggestions. If you know their GitHub handle(s), please provide them here, but without tagging them i.e. leave the '@' off the start of the handle. |
The issues noted above, and some others that I noticed, have been addressed. Let me know if I missed anything or Regarding the alternative text for all the figures I went through and tried to make all of the descriptions The reviewers I have in mind are: descostesn - Nicolas Descostes, Head of Bioinformatics, EMBL Rome cokelaer or ddesvillechabrol - Authors of the Sequanix GUI for Snakemake, at Institut Pasteur |
@tkphd & @jdblischak thank you for volunteering to review this Snakemake for Bioinformatics lesson for The Carpentries Lab. When you are ready, please post your reviews as replies in this thread. If you have any questions for me during the review, please ask. You can read more about the lesson review process in our Reviewer Guide, where you will also find the checklist for Reviewers. |
Checking in here with @tkphd and @jdblischak. Please reach out if you have any questions or need any assistance with your reviews of this Snakemake lesson, or if your capacity for this has changed and you need to step away from your role as a reviewer. |
@tobyhodges sorry for the delay. I started reviewing the lesson last week but then got sidetracked before I could complete it. I'll return to it this week |
Likewise!
…On Wed, Aug 2, 2023, 4:19 PM John Blischak ***@***.***> wrote:
@tobyhodges <https://github.com/tobyhodges> sorry for the delay. I
started reviewing the lesson last week but then got sidetracked before I
could complete it. I'll return to it this week
—
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQUCE7H63VXSMERTURLFC3XTKY3VANCNFSM6AAAAAATBGMAFI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I've gone through the lesson material. Congrats to @gperu, @tbooth, and colleagues for creating this thorough introduction to Snakemake! To get started, below is the reviewer checklist: Reviewer ChecklistAccessibility
I confirmed that the main figures had alt text. They could probably use more though if the goal is to convey the same information to someone using a screen reader. For example, the information contained in the image at the beginning of Episode 7 summarizes the input files (eg 18 in total due to paired reads, 3 conditions, and 3 reps) and the various steps. The steps are well described in the text above, but not the input files. Content
Design
Supporting information
I think the Setup would benefit from a few improvements, especially if self-learners are going to follow the instructions alone. For installing the data, the wget --content-disposition https://ndownloader.figshare.com/files/35058796
tar xJf data-for-snakemake-novice-bioinformatics.tar.xz
ls -R data/ In lesson 10 on conda integration, it states:
But you provide a link to Miniconda in the setup instructions. I recommend replacing the above text with a link back to the Setup instructions. Also, I wasn't able to run And one last note on the Glossary question. The number of terms that could potentially be added is enormous, but given the prerequisite knowledge, I think it is adequate. One term I would suggest to add is "wildcard". While this is also used in the shell, wildcards are so central to Snakemake that it seems worth defining it. General
Very readable
All the basics are covered
I'm aware of another snakemake-based lesson called HPC Workflow Management with Snakemake . But I haven't gone through it, so I don't know how much overlap there is
I think it would be tougher for a self-directed learner, especially the open-ended episode 11 to write a pipeline from scratch. But I think there is sufficient material in the other episodes that a self-learner would still benefit from going through it I'll provide more snakemake-specific comments in a follow-up post |
Thank you very much for your review and detailed feedback @jdblischak 🙌
Your co-reviewer, Trevor @tkphd, is one of the main authors of that hpc-workflows lesson, which is still in the early stages of design and development. I am sure they will be able to comment more about the similarities and differences between the lessons, their target audiences and main objectives. |
Some minor comments related specifically to Snakemake: 03 - Chaining rulesNote that
|
Now for some feedback on higher-level ideas. I don't expect you to overhaul your lesson based on my comments. But I think that it would be worth adding a few boxes to alert learners when they will likely see something different in other Snakemake tutorials. Putting
|
@jdblischak thanks so much for all the detailed comments and the PRs. I'll start to go through these points in detail, hopefully next week. Anything that needs more specific discussion I'll break out into a separate issue. |
Hi @tbooth, apologies for the delay, but I am working through the lesson. Overall, it looks great! Here's what I have for Setup. HomepageThe lesson homepage lists the following prerequisites:
I suggest rephrasing to state the actual prerequisites, and separately Prerequisites:
Optional background:
SetupSuggest breaking this up into (at least) 3 sections: Software, Data, and The instructor may need to show this page at the beginning of the lesson for SoftwareConda is a common and useful tool, but it is simply invoked, not
Data
Editor
|
Thanks for providing feedback on the setup instructions, @tkphd. Are you able to give an estimate of when you will be able to review the remainder of the lesson? |
Thanks to both reviewers for the thorough comments so far, which have clearly taken a lot of time to put together. I've already made a draft response to most of the points, fixed many things, and triaged the more complex ones into individual issues. However I'll wait for @tkphd to submit the rest of the review before making a full response. |
Pinging @tkphd, to see if they have had any time to complete the review? Trevor, I'm grateful to you for volunteering to review and I know from personal experience how one's capacity to take such things on can change at short notice. We would still love to get your perspective on the lesson, but If you need to step back from the review please let me know so that I can take this off your plate and look for somebody else. |
Hi @tobyhodges and @tbooth, I'm still working through the lesson material. It looks good, but my curiosity/interest in Snakemake means I'm going very slowly. While I'm learning, it would be helpful to resolve issues highlighted by the
|
Over the last couple of weeks I have moved the lesson from the old format to the new RMarkdown format. I have also reviewed the episodes in light of recent changes to Snakemake and tested/updated everything for Snakemake 8.5 (the latest version) I believe I've also resolved many if not most of the things mentioned by the two reviewers. Possibly I have introduced an error or two, but I'm teaching the whole course in just a couple of weeks so that should flush out any obvious errata. I have further time I can spend on this during March, after which a funding deadline expires, so is there any chance we could get this review wrapped up in the next month? @tkphd if you have further comments to add then could you please add them as soon as you have them, and I'll make changes accordingly. If you want to see a stable version of the lessons without my recent changes then the legacy/gh_pages branch in GitHub holds the old version. |
It seems the auto-build of the lesson pages from my latest push to GitHub has not worked. @tobyhodges can you see why this might be? The local |
I had to fiddle with the |
Great to see the lesson transitioned to use the new infrastructure, @tbooth.
This was a bug exposed by the release of R v4.3 at the end of last month, which was fixed in a recent release of pegboard. BTW I have now enabled the automated pull requests to update workflows on your repository. |
Hi folks, in private discussion @tkphd and I agreed that they will step back from this lesson review. I have begun looking for another reviewer, and I hope we can get some momentum going here again shortly. @tbooth thank you for your patience, please let me know what questions and/or concerns you have. |
Thanks, Toby. The good news is I still have funding to work on this and maintain and improve the lesson, and I'm still teaching the material myself at least a couple of times a year. It would be great to be able to say "I wrote a Carpentries Lesson" and I've had a few people ask why it's still in limbo. The comments from both reviewers were really useful though, so I don't mind having a new reviewer (or indeed feedback and issue reports from anyone), but I really hope they can get on to it quickly. |
@cmeesters thank you for volunteering to review a lesson for The Carpentries Lab. Please can you confirm that you are happy to review this Snakemake for Bioinformatics lesson? You can read more about the lesson review process in our Reviewer Guide. |
Yes. |
Thanks for stepping in @cmeesters. I don't know if you plan to send all the review comments together or in stages, but if you flag up anything here that relates to technical issues or specific fixes I'll open an issue on the issue tracker. |
Hi, I am sorry, it took me so long - I was pretty busy and now on holidays. I can only provide feedback for two more days, then I will be offline for two weeks. Anyway: ReviewGeneral NotesThank you for your contribution of this new teaching material for reproducible data analysis with Snakemake! I sincerely hope not to discourage you by requiring a little more work to make this material more impactful and reliable. Chapter NotesSetupThe Setup is mostly well described. As Snakemake recommends mamba, mamba should be mentioned, too. Minor issues:
Chapter 1 - Running commands with SnakemakeI really like the data set selection and description in the chapter! However, a description about Snakemake, reproducible computing and the whole fuzz is all too brief as a motivation. Some hints: Snakemake is part of NumFocus, has over a million downloads, and an awful lot of citations. Some minor issues:
Chapter 2 - Placeholder and WildcardsMajor Issue:
Minor issues:
Chapter 3 - Chaining RulesThe illustration of the processing steps is well done. I like the box about error creation and handling very much, this is well explained. Major Issues:
Minor Issues:
Chapter 4 - The DAGThe DAG is nicely introduced. Also, the Major Issue:
# Kallisto quantification of one sample
rule kallisto_quant:
output:
h5 = "kallisto.{sample}/abundance.h5",
tsv = "kallisto.{sample}/abundance.tsv",
json = "kallisto.{sample}/run_info.json",
input:
index = "transcriptome/Saccharomyces_cerevisiae.R64-1-1.kallisto_index",
fq1 = "trimmed/{sample}_1.fq",
fq2 = "trimmed/{sample}_2.fq",
shell:
"kallisto quant -i {input.index} -o kallisto.{wildcards.sample} {input.fq1} {input.fq2}"
rule kallisto_index:
output:
idx = "transcriptome/{strain}.kallisto_index",
log:
log = "log/{strain}.kallisto_log",
input:
fasta = "transcriptome/{strain}.cdna.all.fa"
shell:
"kallisto index -i {output.idx} {input.fasta} >& {log}" Idea:
Chapter 5 - Processing lists of InputsIt's a very nice chapter. Personally, as a teacher, I would very much prefer to split this into tiny tasks and have a solution for every sub-task. But that is a matter of taste and perhaps not so easy to implement Chapter 6 - Awkward programsIn principle, very nice - it's an introduction to newbies and you cannot cover everything. However, we recommend using Snakemake wrapper to handle awkward programs to increase the stability of workflows wherever possible. There is not even an outlook to wrappers. This is - IMO - a major flaw. Chapter 7 - FinishingAgain, a nice description. The solution, however, does not work, and stating in the file Chapter 8 - ConfigurationA pretty good chapter. Yet, now there cannot be a one-file solution. You ask participants to download the solution and split the content into different files manually. This is pretty error-prone. Do not do that: You will realize upon teaching, that participants will stumble and comprehension comes from seeing. You need to require, that the configuration file gets into a separate folder, and you need to provide individual solutions for this. Chapter 9 - Optimizing the WorkflowGood, start. There are, however, a few major issues:
Chapter 10 - Conda integrationRather well described. However:
Chapter 11 - Designing a new workflowThis Chapter needs a major revision:
Chapter 12 -- Cleaning upVery nice! Issues:
Chapter 13 - QuotingThis is rather a Shell/Bash issue and should be addressed in other courses - or as a mini intro to this one. Review Guideline ChecklistAccessibility
Content
Design
Supporting information
Missing tick marks, might seem a bit harsh. As outlined, a few things ought to be added to be up-to-date. Considering a few things, will place all tick marks automatically. Particularly:
I will be happy to give more feedback after the 2nd week of August and understand that with change from v7 to v8 of Snakemake there have been major changes, no breaks of workflows, but certainly a steeper learning curve with more things to digest. So, I am happy that people refrain from the MOOC idea and endeavour teaching in person! |
Please don't mention the In other words, I think the current section on Channel configuration and conda-forge is correct and does not need to be updated. |
yes, you are right - never tried fiddling with the rc file to check this. Something learned - thank you. |
Thanks @cmeesters for all the comments. I'll split these down into issues and deal with the easy ones first, as soon as I can. There's a common theme with this review and the partial review from @tkphd that you want the material to be more geared to "newbies", removing the pre-requisite for familiarity with shell scripting and introducing/explaining concepts like using an editor, task backgrounding ("&") and redirection ("<") within the course. This is a fundamental change. I think anyone trying to make use of Snakemake has to have some experience in scripting or programming, beyond just the basics of the interactive shell. I'd be happy to add some callouts to remind people about these points of syntax, but I really think that trying to make the course newbie-ready is a bad idea. Anyone trying to learn Snakemake without the foundational knowledge of Bash syntax is going to be taking on too much at once, and trying to pretend otherwise is doing them a disservice. |
@tobyhodges, I'm hoping you can clear this up for us. Are you happy with the lesson to go into the Carpentries Lab as something geared towards people already familiar with shell scripting, or do I need to make it suitable for Linux newbies? If it needs to be the latter then I really have a problem. From inception, this thing was written for experienced Bash users who are new to Snakemake (and not necessarily familiar with Python). The name snakemake-novice-bioinformatics is maybe unfortunate in this regard but that name was never my choice. Many of the comments from @cmeesters make perfect sense if this were to be a course for Linux newbies, but I've never thought that was the remit and it's not what I set out to write. For now I'll look to make explicit references back to where some concepts are introduced in https://swcarpentry.github.io/shell-novice, since this will make it explicit that I am building on the foundational material. |
I really do not want to interfere with you conceptually. Please focus on the Snakemake-oriented remarks, then. |
Thanks all for your comments so far, particularly to @cmeesters for the detailed review and @tbooth for seeking clarification so swiftly. It seems that you are reaching a resolution already, but I will pitch in with my point of view as an editor for the Lab. Lessons in The Carpentries Lab do not need to be aimed at any particular audience, either in terms of domain expertise or level of prior knowledge. What is important is that the target audience of a lesson is defined, that this audience description is realistic, and that the design and content of the lesson is appropriate to the stated audience. So my answer to @tbooth's question
is that you absolutely do not need to change the audience of the lesson, but you may need to tweak content to better fit your stated audience and/or tweak your stated audience to fit the content (meeting somewhere in the middle may be the most practical solution). For example, in the list of prerequisites for the lesson states that learners are expected to arrive with
Linking to the SWC Shell lesson is great, but will give people the impression that they are ready for your lesson if they know about the stuff in that one. So please watch out for anything used in the Snakemake lesson that is not covered in SWC Shell, e.g. you mentioned background execution with |
Two considerations:
I think we basically can consider two types of participants. Those who are familiar with the shell. And those who do not. (There is more to consider, but we shall put everything else aside for this little thought experiment). People who followed, for example a bioinformatics master course, certainly need only little reminders for certain techniques, if at all. Yet there are more people who want to analyse data. I, for example, frequently am confronted with biologists, who never saw a command line before. Now, we can point them to other courses. When I teach a certain practical course, I put working with the command line and logging in to remote systems before the actual data analysis part. However, usually people are working under pressure. They just started a thesis and need data to be analysed. In this scenario there is little time to digest new content, like a predecessor bash-course, thoroughly. This is why I, regardless of specific contents like background processes, recommend keeping the newbie in mind. I am sorry, if that came out wrong. All in all, a little(!) redundancy does not hurt.
|
Lesson Title
Snakemake for Bioinformatics
Lesson Repository URL
https://github.com/carpentries-incubator/snakemake-novice-bioinformatics
Lesson Website URL
https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/
Lesson Description
Researchers needing to implement data analysis workflows face a number of common challenges, including the need to organise tasks, make effective use of compute resources, handle any errors in processing, and document and share their methods. The Snakemake workflow system provides effective solutions to these problems. By the end of the course, you will be confident in using Snakemake to run real workflows in your day-to-day research.
Snakemake workflows are described by special scripts that define steps in the workflow as rules, and these are then used by Snakemake to construct and execute a sequence of shell commands to yield the desired output. Re-calculation of existing results is avoided where possible, so you can add or update input data, then efficiently generate an updated result. Workflows can be seamlessly scaled to server, cluster, grid and cloud environments without the need to modify the workflow definition.
This course is primarily intended for researchers who need to automate data analysis tasks for biological research involving next-generation sequence data, for example RNA-seq analysis, variant calling, CHIP-Seq, bacterial genome assembly, etc. However, Snakemake has many uses beyond this and the course does not assume any specialist biological knowledge. The language used to write Snakemake workflows is Python-based, but no prior knowledge of Python is required or assumed either. We do require that attendees must have familiarity with using the Linux command line (pipes, redirects, variables, …).
Author Usernames
@
Zenodo DOI
No response
Differences From Existing Lessons
@tbooth
Confirmation of Lesson Requirements
JOSE Submission Requirements
paper.md
andpaper.bib
files as described in the JOSE submission guide for learning modulesPotential Reviewers
No response
The text was updated successfully, but these errors were encountered: