-
Notifications
You must be signed in to change notification settings - Fork 964
New module: lsa/cosine (calculate cosine similarity) #10002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
pinin4fjords
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
There's a few things to do to make this fit in with repository standards, and I think there's some complexity to be stripped.
In particular, since there's no actual software called 'cosimflow', I think this needs to be named like lsa/cosine in my view, since it's principally a wrapper around that function. Or possibly it could be a script under custom/.
I've simplified the script a bit to apply some of my suggestions for you:
modules/nf-core/cosimflow/main.nf
Outdated
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
| 'https://depot.galaxyproject.org/singularity/mulled-v2-ad9dd5f398966bf899ae05f8e7c54d0d94018089:00bd7e543b7135f299839496732948679f2e3073-0' : | ||
| 'biocontainers/mulled-v2-ad9dd5f398966bf899ae05f8e7c54d0d94018089:00bd7e543b7135f299839496732948679f2e3073-0' }" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | |
| 'https://depot.galaxyproject.org/singularity/mulled-v2-ad9dd5f398966bf899ae05f8e7c54d0d94018089:00bd7e543b7135f299839496732948679f2e3073-0' : | |
| 'biocontainers/mulled-v2-ad9dd5f398966bf899ae05f8e7c54d0d94018089:00bd7e543b7135f299839496732948679f2e3073-0' }" | |
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | |
| 'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/4d/4d94f159b95315adf8bf54fdc9db88db10a5aef72dca6245dd163b91e9e0437e/data' : | |
| 'community.wave.seqera.io/library/r-base_r-lsa_r-pheatmap_r-optparse_pruned:901156bc11e60b28' }" |
I made you some containers via Wave/ Seqera Containers. These are easier to maintain for multi-dependency modules.
modules/nf-core/cosimflow/main.nf
Outdated
| when: | ||
| task.ext.when == null || task.ext.when | ||
|
|
||
| script: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use template 'cosimflow.R' with the script in templates/cosimflow.R instead of inline R, keeps things readable.
I'm going to attach a suggested version of your script content to use in that way.
| name: cosimflow | ||
| channels: | ||
| - conda-forge | ||
| - bioconda | ||
| - defaults | ||
| dependencies: | ||
| - r-base | ||
| - r-lsa | ||
| - r-pheatmap | ||
| - r-optparse | ||
| - r-readr | ||
| - r-readxl | ||
| - r-dplyr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| name: cosimflow | |
| channels: | |
| - conda-forge | |
| - bioconda | |
| - defaults | |
| dependencies: | |
| - r-base | |
| - r-lsa | |
| - r-pheatmap | |
| - r-optparse | |
| - r-readr | |
| - r-readxl | |
| - r-dplyr | |
| --- | |
| channels: | |
| - conda-forge | |
| - bioconda | |
| dependencies: | |
| - conda-forge::r-base=4.5.2 | |
| - conda-forge::r-lsa=0.73.4 | |
| - conda-forge::r-pheatmap=1.0.13 | |
| - conda-forge::r-optparse=1.7.5 | |
| - conda-forge::r-readr=2.1.6 |
- Pin versions with channel prefixes:
conda-forge::r-lsa=0.73.4not justr-lsa - Remove
defaultschannel - nf-core only usesconda-forgeandbioconda
modules/nf-core/cosimflow/main.nf
Outdated
| ) | ||
|
|
||
| # AQUÍ PASAMOS LOS ARGUMENTOS DE NEXTFLOW DIRECTAMENTE | ||
| args_list <- c("--input", "$expression_matrix", "--out_prefix", "$prefix") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script constructs c("--input", "$expression_matrix", "--out_prefix", "$prefix") then parses them back into opt$input and opt$out_prefix - getting the same values it started with. Only task.ext.args actually needs parsing. Feed $args directly to optparse for configurable options (--method, --min_gene_mean), and use Nextflow template variables for input/prefix. See attached template.
modules/nf-core/cosimflow/main.nf
Outdated
|
|
||
| # ---- Load table ---- | ||
| if (grepl("\\\\.xlsx?\$", opt\$input, ignore.case=TRUE)) { | ||
| df <- as.data.frame(read_excel(opt\$input)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it really need Excel input? Let's stick with CSV if poss please
| // 3. Verificamos que termine con éxito | ||
| assert process.success | ||
|
|
||
| // 4. Verificamos que se generen los archivos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, with apologies, English would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am so sorry! I will keep everything in English in the future. Thanks for correcting it all!
| assert process.success | ||
|
|
||
| // 4. Verificamos que se generen los archivos | ||
| assert process.out.matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the outputs are deterministic, they should be snapshotted, if not you should snapshot file names at least, see e.g. how deseq2/differential module does it:
{ assert snapshot(
process.out.results,
...
file(process.out.dispersion_plot_png[0][1]).name,
...
).match() },
| assert process.out.versions | ||
| } | ||
|
|
||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need a stub test (once you've added a stub to the main.nf)
modules/nf-core/cosimflow/main.nf
Outdated
| paste0(" r-pheatmap: ", packageVersion("pheatmap")) | ||
| ), "versions.yml") | ||
| """ | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll also need a 'stub' - take a look at some other modules if you're not sure what I mean
modules/nf-core/cosimflow/main.nf
Outdated
| @@ -0,0 +1,114 @@ | |||
| process COSIMFLOW { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is principally a wrapper around lsa::cosine, there isn't any tool called 'cosimflow' installed here, so by convention this module should really be LSA_COSINE, stored in a file at modules/nf-core/lsa/cosine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That fits with conventions here for e.g. deseq2/differential.
…odules into nf-core-cosimflow
|
@pinin4fjords Thank you for all of corrections, comments and suggestions, I really appreciate it, for your patience and the detailed help!! I hope I have applied everything you requested: -Renamed it to lsa/cosine. -I switched the input to CSV-only (removed readxl dependency) as suggested. -I updated meta.yml (license/maintainers) and fixed the variable scope in main.nf so the R template works correctly.
If there's anything that I have missed or that needs a second look at, I'm all ears! Again, thank you! |
Hi @ewels and @pinin4fjords!
Following up on our chat in the proposals channel, here’s the PR for the cosimflow module. It’s a tool for calculating pairwise cosine similarity from expression matrices (outputting both the matrix and a heatmap).
I think it could be a handy addition for the rnaseq pipeline, maybe as a downstream QC step to see how samples cluster together beyond the usual PCA.
PR checklist:
Closes: xxx
Tests performed locally:
nf-core modules test cosimflow --profile condaQuick note: I ended up putting the R script directly inside the
main.nfscript block. It was the cleanest way I found to handle the dollar signs and variable escaping for R/Nextflow without everything blowing up.