Skip to content

Conversation

@miguelrosell
Copy link

@miguelrosell miguelrosell commented Feb 12, 2026

Hi @ewels and @pinin4fjords!

Following up on our chat in the proposals channel, here’s the PR for the cosimflow module. It’s a tool for calculating pairwise cosine similarity from expression matrices (outputting both the matrix and a heatmap).

I think it could be a handy addition for the rnaseq pipeline, maybe as a downstream QC step to see how samples cluster together beyond the usual PCA.

PR checklist:

Closes: xxx

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions.
  • If necessary, include test data in your PR (Created a small dummy CSV on-the-fly in the test script).
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label (process_medium).
  • Use BioConda and BioContainers.

Tests performed locally:

  • nf-core modules test cosimflow --profile conda

Quick note: I ended up putting the R script directly inside the main.nf script block. It was the cleanest way I found to handle the dollar signs and variable escaping for R/Nextflow without everything blowing up.

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

There's a few things to do to make this fit in with repository standards, and I think there's some complexity to be stripped.

In particular, since there's no actual software called 'cosimflow', I think this needs to be named like lsa/cosine in my view, since it's principally a wrapper around that function. Or possibly it could be a script under custom/.

I've simplified the script a bit to apply some of my suggestions for you:

pr10002_cosimflow_template.R.zip

Comment on lines 6 to 8
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-ad9dd5f398966bf899ae05f8e7c54d0d94018089:00bd7e543b7135f299839496732948679f2e3073-0' :
'biocontainers/mulled-v2-ad9dd5f398966bf899ae05f8e7c54d0d94018089:00bd7e543b7135f299839496732948679f2e3073-0' }"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-ad9dd5f398966bf899ae05f8e7c54d0d94018089:00bd7e543b7135f299839496732948679f2e3073-0' :
'biocontainers/mulled-v2-ad9dd5f398966bf899ae05f8e7c54d0d94018089:00bd7e543b7135f299839496732948679f2e3073-0' }"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/4d/4d94f159b95315adf8bf54fdc9db88db10a5aef72dca6245dd163b91e9e0437e/data' :
'community.wave.seqera.io/library/r-base_r-lsa_r-pheatmap_r-optparse_pruned:901156bc11e60b28' }"

I made you some containers via Wave/ Seqera Containers. These are easier to maintain for multi-dependency modules.

when:
task.ext.when == null || task.ext.when

script:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use template 'cosimflow.R' with the script in templates/cosimflow.R instead of inline R, keeps things readable.

I'm going to attach a suggested version of your script content to use in that way.

Comment on lines 1 to 13
name: cosimflow
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- r-base
- r-lsa
- r-pheatmap
- r-optparse
- r-readr
- r-readxl
- r-dplyr
Copy link
Member

@pinin4fjords pinin4fjords Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: cosimflow
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- r-base
- r-lsa
- r-pheatmap
- r-optparse
- r-readr
- r-readxl
- r-dplyr
---
channels:
- conda-forge
- bioconda
dependencies:
- conda-forge::r-base=4.5.2
- conda-forge::r-lsa=0.73.4
- conda-forge::r-pheatmap=1.0.13
- conda-forge::r-optparse=1.7.5
- conda-forge::r-readr=2.1.6
  • Pin versions with channel prefixes: conda-forge::r-lsa=0.73.4 not just r-lsa
  • Remove defaults channel - nf-core only uses conda-forge and bioconda

)

# AQUÍ PASAMOS LOS ARGUMENTOS DE NEXTFLOW DIRECTAMENTE
args_list <- c("--input", "$expression_matrix", "--out_prefix", "$prefix")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script constructs c("--input", "$expression_matrix", "--out_prefix", "$prefix") then parses them back into opt$input and opt$out_prefix - getting the same values it started with. Only task.ext.args actually needs parsing. Feed $args directly to optparse for configurable options (--method, --min_gene_mean), and use Nextflow template variables for input/prefix. See attached template.


# ---- Load table ----
if (grepl("\\\\.xlsx?\$", opt\$input, ignore.case=TRUE)) {
df <- as.data.frame(read_excel(opt\$input))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really need Excel input? Let's stick with CSV if poss please

// 3. Verificamos que termine con éxito
assert process.success

// 4. Verificamos que se generen los archivos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, with apologies, English would be better

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am so sorry! I will keep everything in English in the future. Thanks for correcting it all!

assert process.success

// 4. Verificamos que se generen los archivos
assert process.out.matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the outputs are deterministic, they should be snapshotted, if not you should snapshot file names at least, see e.g. how deseq2/differential module does it:

                { assert snapshot(
                        process.out.results,
                        ...
                        file(process.out.dispersion_plot_png[0][1]).name,
                        ...
                    ).match() },

assert process.out.versions
}

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need a stub test (once you've added a stub to the main.nf)

paste0(" r-pheatmap: ", packageVersion("pheatmap"))
), "versions.yml")
"""
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also need a 'stub' - take a look at some other modules if you're not sure what I mean

@@ -0,0 +1,114 @@
process COSIMFLOW {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is principally a wrapper around lsa::cosine, there isn't any tool called 'cosimflow' installed here, so by convention this module should really be LSA_COSINE, stored in a file at modules/nf-core/lsa/cosine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That fits with conventions here for e.g. deseq2/differential.

@miguelrosell miguelrosell changed the title New module: cosimflow for cosine similarity New module: lsa/cosine (calculate cosine similarity) Feb 13, 2026
@miguelrosell
Copy link
Author

@pinin4fjords Thank you for all of corrections, comments and suggestions, I really appreciate it, for your patience and the detailed help!!

I hope I have applied everything you requested:

-Renamed it to lsa/cosine.

-I switched the input to CSV-only (removed readxl dependency) as suggested.

-I updated meta.yml (license/maintainers) and fixed the variable scope in main.nf so the R template works correctly.

  • And the tests and stubs are now locally passing, with snapshots included.

  • And I have fixed the non-English issue :)

If there's anything that I have missed or that needs a second look at, I'm all ears!

Again, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants