Benchmarks

Let's collect ideas here and then create concrete issues for potential benchmarks. 

- [ ] Check trial with user benchmarks from here
  - [ ] [LazySlide paper](https://www.biorxiv.org/content/10.1101/2025.05.28.656548v1)
  - [ ] [this preprint?](https://arxiv.org/pdf/2212.09746)

ideas

- Lines of code / Number of files
  - Duplicity of module code if reused e.g., unsupervised analysis is used X times in the project and always the same code is used
  - Compare the Snakemake file (.smk) with the underlying module code e.g., difference between "module loading" and actual underlying module code i.e., how much lines of code have you saved by using the module
  - Compare from scratch real code/analysis with module usage -> really hard and might be unfair because module code might be more comprehensive and with more features (not nearly all results/features are shown)
- Complexity(?) (see above)
- (re)Run time
- Time to first results
- error rate
- reproducibility
- Trial with users 
  - small internal pilot (declare bias) (implementation ideas above)
  - external volunteer cohort (lab network)
  - LLM/agent setup test with equal context (to not be biased against us, bc MrBiomics probably not in training data)
  - Amazon's Mechanical Turk (MTurk) if sufficiently simplified/easy
Pre-register constraints and report limitations (sample size, prior familiarity).
- competitors/comparators
  - NF-core/Nextflow/Galaxy/ENCODE where applicable
  - module swap experiments within the same recipe with fitting alternative
  - from scratch copy-paste-edit scripts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmarks #69

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmarks #69

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions