-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev => Master 2.3.0 #185
base: master
Are you sure you want to change the base?
Dev => Master 2.3.0 #185
Conversation
The only thing using more than one thread is samtools
Not sure how that snuck back in there
Co-authored-by: maxulysse <maxulysse@users.noreply.github.com>
Add STAR aligner
|
Bump version for 2.3.0 release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those local module & sub-wfs will need nf-test next
|
||
When running the pipeline with groHMM as a transcript identification method, the pipeline will automatically perform a parameter tuning process. This process is unique to the groHMM transcript identification method and is designed to select the optimal hold-out parameters for the groHMM algorithm. See [this issue](https://github.com/dankoc/groHMM/issues/4) for more information. | ||
|
||
In the groHMM vignette, the code is ran using a single mclapply call, which is a scatter gather approach. This is not ideal for large datasets, because it ends up being bottlenecked by the memory available on your local machine. To improve this, we have written a Nextflow script that runs the pipeline with a scatter gather approach. This is done by running the pipeline with a single hold-out parameter, and then the next parameter, and so on. This is more memory efficient and scales better to larger datasets. The results are then combined then combined in the end as intended and used in the transcript identification process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the groHMM vignette, the code is ran using a single mclapply call, which is a scatter gather approach. This is not ideal for large datasets, because it ends up being bottlenecked by the memory available on your local machine. To improve this, we have written a Nextflow script that runs the pipeline with a scatter gather approach. This is done by running the pipeline with a single hold-out parameter, and then the next parameter, and so on. This is more memory efficient and scales better to larger datasets. The results are then combined then combined in the end as intended and used in the transcript identification process. | |
In the groHMM vignette, the code is ran using a single mclapply call, which is a scatter gather approach. This is not ideal for large datasets, because it ends up being bottle-necked by the memory available on your local machine. To improve this, we have written a Nextflow script that runs the pipeline with a scatter gather approach. This is done by running the pipeline with a single hold-out parameter, and then the next parameter, and so on. This is more memory efficient and scales better to larger datasets. The results are then combined in the end as intended and used in the transcript identification process. |
- Mouse: mm10 | ||
- Fly: dm6 | ||
|
||
**This setting is off by default** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**This setting is off by default** | |
:::info | |
**This setting is off by default** | |
::: |
params.fasta = getGenomeAttribute('fasta') | ||
params.gtf = getGenomeAttribute('gtf') | ||
params.gff = getGenomeAttribute('gff') | ||
params.gene_bed = getGenomeAttribute('bed12') | ||
params.bwa_index = getGenomeAttribute('bwa') | ||
params.bwamem2_index = getGenomeAttribute('bwamem2') | ||
params.dragmap = getGenomeAttribute('dragmap') | ||
params.bowtie2_index = getGenomeAttribute('bowtie2') | ||
params.hisat2_index = getGenomeAttribute('hisat2') | ||
params.star_index = null | ||
params.homer_uniqmap = getGenomeAttribute('uniqmap') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params.fasta = getGenomeAttribute('fasta') | |
params.gtf = getGenomeAttribute('gtf') | |
params.gff = getGenomeAttribute('gff') | |
params.gene_bed = getGenomeAttribute('bed12') | |
params.bwa_index = getGenomeAttribute('bwa') | |
params.bwamem2_index = getGenomeAttribute('bwamem2') | |
params.dragmap = getGenomeAttribute('dragmap') | |
params.bowtie2_index = getGenomeAttribute('bowtie2') | |
params.hisat2_index = getGenomeAttribute('hisat2') | |
params.star_index = null | |
params.homer_uniqmap = getGenomeAttribute('uniqmap') | |
params.fasta = getGenomeAttribute('fasta') | |
params.gtf = getGenomeAttribute('gtf') | |
params.gff = getGenomeAttribute('gff') | |
params.gene_bed = getGenomeAttribute('bed12') | |
params.bwa_index = getGenomeAttribute('bwa') | |
params.bwamem2_index = getGenomeAttribute('bwamem2') | |
params.dragmap = getGenomeAttribute('dragmap') | |
params.bowtie2_index = getGenomeAttribute('bowtie2') | |
params.hisat2_index = getGenomeAttribute('hisat2') | |
params.star_index = null | |
params.homer_uniqmap = getGenomeAttribute('uniqmap') |
tuning_file = null | ||
grohmm_min_uts = 5 | ||
grohmm_max_uts = 45 | ||
// Depends on how you look at this one... But I figured most will ignore the negative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Depends on how you look at this one... But I figured most will ignore the negative | |
// Depends on how you look at this one... But I figured most will ignore the negative |
Could you clarify what you mean here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Regular test runs smoothly under 10' and seems easy to follow. I left a couple of comments after a first pass. Will come back for more :D
cpus = { check_max( 1 * task.attempt, 'cpus' ) } | ||
memory = { check_max( 6.GB * task.attempt, 'memory' ) } | ||
time = { check_max( 4.h * task.attempt, 'time' ) } | ||
// TODO nf-core: Check the defaults for all processes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All local modules seem to have labels, and test seems to run fine. You can probably remove this TODO comment
@@ -51,7 +53,7 @@ On release, automated continuous integration tests run the pipeline on a full-si | |||
## Usage | |||
|
|||
> [!NOTE] | |||
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data. | |||
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow.Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data. | |||
|
|||
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is left for last. Just a reminder for minimal description of pipeline steps + metro map
channels: | ||
- conda-forge | ||
- bioconda | ||
- defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think - defaults should now be removed from every module
@@ -0,0 +1,170 @@ | |||
#!/usr/bin/env Rscript |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this: https://nf-co.re/docs/checklists/reviews/pipeline_release_pr#do-local-code-and-modules
which says that bin scripts should have author and MIT license embedded. I don't have them in my pipeline either and not sure how new and mandatory this rule is; just leaving the link and comment here to figure it out
@@ -225,7 +244,7 @@ The [Preseq](http://smithlabresearch.org/software/preseq/) package is aimed at p | |||
<details markdown="1"> | |||
<summary>Output files</summary> | |||
|
|||
- `bbmap/` | |||
- `quality_control/bbmap/` | |||
- `*.coverage.hist.txt`: Histogram of read coverage over each chromosome | |||
- `*.coverage.stats.txt`: Coverage stats broken down by chromosome including %GC, pos/neg read coverage, total coverage, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `*.coverage.stats.txt`: Coverage stats broken down by chromosome including %GC, pos/neg read coverage, total coverage, etc. | |
- `<samplename>.coverage.stats.txt`: Coverage stats broken down by chromosome including %GC, pos/neg read coverage, total coverage, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I would change the asterisk to everywhere else (when viable) in the output.md likewise.
@@ -209,9 +228,9 @@ The majority of RSeQC scripts generate output files which can be plotted and sum | |||
<details markdown="1"> | |||
<summary>Output files</summary> | |||
|
|||
- `<ALIGNER>/preseq/` | |||
- `quality_control/preseq/` | |||
- `*.lc_extrap.txt`: Preseq expected future yield file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also see a <samplename>.c_curve.txt
@@ -240,7 +259,7 @@ The [Preseq](http://smithlabresearch.org/software/preseq/) package is aimed at p | |||
<details markdown="1"> | |||
<summary>Output files</summary> | |||
|
|||
- `bedtools/` | |||
- `coverage_graphs/` | |||
- `*.minus.bedGraph`: Sample coverage file (negative strand only) in bedGraph format | |||
- `*.plus.bedGraph`: Sample coverage file (positive strand only) in bedGraph format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also see a <samplename>.dreg.bedGraph file
### PINTS | ||
|
||
<details markdown="1"> | ||
<summary>Output files</summary> | ||
|
||
- `pints/` | ||
- `transcript_identification/pints/` | ||
- `*_bidirectional_peaks.bed`: Bidirectional TREs (divergent + convergent) | ||
- `*_divergent_peaks.bed`: Divergent TREs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add (optional) to the two files above, since you don't always receive those (at least not with the test.config).
@@ -346,7 +371,7 @@ They've also created some bed files that might be useful for analysis. | |||
<details markdown="1"> | |||
<summary>Output files</summary> | |||
|
|||
- `<ALIGNER>/featurecounts/` | |||
- `quantification/featurecounts/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under quantification, I got two folders instead: gene and nascent
Waiting for #184PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).