Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for generating taxprofiler/funcscan input samplesheets for preprocessed FASTQs/FASTAs #688

Draft
wants to merge 19 commits into
base: dev
Choose a base branch
from

Conversation

vagkaratzas
Copy link

@vagkaratzas vagkaratzas commented Oct 10, 2024

Closes #687 and #686

This adds the local subworkflow (and other relevant code and docs) for generating samplesheets for the downstream pipelines funcscan and taxprofiler.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@vagkaratzas vagkaratzas marked this pull request as draft October 10, 2024 09:10
@vagkaratzas vagkaratzas marked this pull request as ready for review October 10, 2024 12:25
@vagkaratzas vagkaratzas linked an issue Oct 10, 2024 that may be closed by this pull request
@jfy133 jfy133 changed the title first draft push with TODOs Add support for generating taxprofiler input samplesheet for preprocessed FASTQs Oct 10, 2024
@jasmezz jasmezz changed the title Add support for generating taxprofiler input samplesheet for preprocessed FASTQs Add support for generating taxprofiler/funcscan input samplesheets for preprocessed FASTQs/FASTAs Oct 10, 2024
@vagkaratzas vagkaratzas marked this pull request as draft October 10, 2024 13:06
@jasmezz jasmezz marked this pull request as ready for review October 10, 2024 15:09
Copy link
Contributor

@CarsonJM CarsonJM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, I think getting this blueprint in place will be very helpful in the future!

nextflow.config Outdated
@@ -194,6 +194,9 @@ params {
validationShowHiddenParams = false
validate_params = true

// Generate downstream samplesheets
generate_downstream_samplesheets = false
generate_pipeline_samplesheets = "funcscan,taxprofiler"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this default to null so that users have to opt-in to samplesheet generation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good idea! I will set that in createtaxdb too

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference? In both cases null/false as default it would be an opt-in by the user, no?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah maybe I misunderstood Carson... not sure :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like @jfy133 used only one workflow, which will selectively generate samplesheets based on params.generate_pipeline_samplesheets. Do you think it would be best to keep that consistent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since FastQ files are being pulled from the publishDir, it might be a good idea to include options that override user inputs for params.publish_dir_mode (so that it is always 'copy' if a samplesheet is generated) and params.save_clipped_reads, params.save_phixremoved_reads ...etc so that the preprocessed FastQ files are published to the params.outdir if a downstream samplesheet is generated

docs/output.md Outdated Show resolved Hide resolved
docs/output.md Outdated Show resolved Hide resolved
docs/output.md Outdated Show resolved Hide resolved
docs/output.md Outdated Show resolved Hide resolved
nextflow.config Outdated
@@ -194,6 +194,9 @@ params {
validationShowHiddenParams = false
validate_params = true

// Generate downstream samplesheets
generate_downstream_samplesheets = false
generate_pipeline_samplesheets = "funcscan,taxprofiler"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good idea! I will set that in createtaxdb too

ch_assemblies

main:
def downstreampipeline_names = params.generate_pipeline_samplesheets.split(",")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also implemented the same system in createtaxdb now, but with an additional input validation thing that you should also adopt here (i.e., to check that someone doesn't add an unsupported pipeline, or makes a typo).

Check the utils_nfcore_createtaxdb_pipeline file there

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ch_assemblies

main:
ch_list_for_samplesheet = ch_assemblies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next thing which I don't think will be so complicated is to add another input channel for bins, and here make an if/else statement if they want to send just the raw assemblies (all contigs) or binned contigs to the samplesheet.

It will need another pipeline level parameter too though --generate_samplesheet_funcscan_seqtype or something

@jfy133 jfy133 marked this pull request as draft October 14, 2024 13:22

// Validate samplesheet generation parameters
if (params.generate_downstream_samplesheets && !params.generate_pipeline_samplesheets) {
error('[nf-core/createtaxdb] If supplying `--generate_downstream_samplesheets`, you must also specify which pipeline to generate for with `--generate_pipeline_samplesheets! Check input.')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nf-core/mag ?

workflows/mag.nf Outdated
@@ -25,6 +25,7 @@ include { ANCIENT_DNA_ASSEMBLY_VALIDATION } from '../subworkflows/local/ancient_
include { DOMAIN_CLASSIFICATION } from '../subworkflows/local/domain_classification'
include { DEPTHS } from '../subworkflows/local/depths'
include { LONGREAD_PREPROCESSING } from '../subworkflows/local/longread_preprocessing'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing :D

@jfy133
Copy link
Member

jfy133 commented Oct 16, 2024

Something somewhere is not liking my new channel of gzipped assemblies...

groovy.lang.MissingMethodException: No signature of method: Script_87acd35bc62883ad$_runScript_closure1$_closure2$_closure25.call() is applicable for argument types: (sun.nio.fs.UnixPath) values: [/workspace/mag/testing/work/d0/4921e9a545764c0c17c5d22a8a639c/MEGAHIT/MEGAHIT-test_minigut_sample2.contigs.fa.gz]
Possible solutions: any(), any(), any(groovy.lang.Closure), each(groovy.lang.Closure), tap(groovy.lang.Closure), any(groovy.lang.Closure)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:271)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:790)
        at groovy.lang.GroovyObject.invokeMethod(GroovyObject.java:39)
        at org.codehaus.groovy.vmplugin.v8.IndyGuardsFiltersAndSignatures.invokeGroovyObjectInvoker(IndyGuardsFiltersAndSignatures.java:151)
        at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
        at nextflow.extension.MapOp$_apply_closure1.doCall(MapOp.groovy:56)
        at jdk.internal.reflect.GeneratedMethodAccessor179.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:279)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
        at groovy.lang.Closure.call(Closure.java:433)
        at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
        at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:108)
        at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43)
        at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293)
        at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30)
        at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93)
        at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
        ```

@jfy133
Copy link
Member

jfy133 commented Oct 24, 2024

Almost there,

nextflow run ../main.nf -profile test,docker --outdir ./results --generate_downstream_samplesheets --generate_pipeline_samplesheets taxprofiler,funcscan --save_phixremoved_reads

though includes a file /workspace/mag/testing/results/QC_shortreads/remove_phix/test_minigut/test_minigut_1.merged.fastq.gz in the csv that doesn't exceute in the file - I suspect due to the .merged. indicating it's runmerged (which is not in the remove_phix folder...)

Copy link

github-actions bot commented Oct 24, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 535747c

+| ✅ 318 tests passed       |+
#| ❔   2 tests were ignored |#
!| ❗   6 tests had warnings |!

❗ Test warnings:

  • nextflow_config - Config manifest.version should end in dev: 3.2.0
  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file [TODO: try and test using for --host_fasta and --host_genome]
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-10-27 05:47:42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate taxprofiler samplesheet from mag
4 participants