-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Let's collect ideas here and then create concrete issues for potential benchmarks.
- Check trial with user benchmarks from here
ideas
- Lines of code / Number of files
- Duplicity of module code if reused e.g., unsupervised analysis is used X times in the project and always the same code is used
- Compare the Snakemake file (.smk) with the underlying module code e.g., difference between "module loading" and actual underlying module code i.e., how much lines of code have you saved by using the module
- Compare from scratch real code/analysis with module usage -> really hard and might be unfair because module code might be more comprehensive and with more features (not nearly all results/features are shown)
- Complexity(?) (see above)
- (re)Run time
- Time to first results
- error rate
- reproducibility
- Trial with users
- small internal pilot (declare bias) (implementation ideas above)
- external volunteer cohort (lab network)
- LLM/agent setup test with equal context (to not be biased against us, bc MrBiomics probably not in training data)
- Amazon's Mechanical Turk (MTurk) if sufficiently simplified/easy
Pre-register constraints and report limitations (sample size, prior familiarity).
- competitors/comparators
- NF-core/Nextflow/Galaxy/ENCODE where applicable
- module swap experiments within the same recipe with fitting alternative
- from scratch copy-paste-edit scripts
Metadata
Metadata
Assignees
Labels
No labels