A template for modular data workflows using snakemake, part of the clio toolset.
To familiarise yourself with clio data modules:
- Check the auto-generated minimal example. You can find it in
tests/integration/Snakefile. - Read about the
clioapproach in our documentation. - Read about
snakemakemodularisation in their documentation.
We recommend using pixi as your package manager. Once installed, do the following:
-
Install the templater tool
copier.pixi global install copier
-
Use
copierto build a project with this template. A new module will be created in the directory you chose. We recommend you use the module name as the directory name.copier copy https://github.com/calliope-project/data-module-template.git ./path/to/<module_name>
If your terminal does not have access to
copierthen you may need to update yourPATHvariable to include~/.pixi/bin. -
Answer some questions so can we pre-fill licensing, citation files, etc...
-
Initialise the
pixiproject environment of your new module.cd ./path/to/<module_name> # navigate to the new project pixi install --all # install the project environment
-
Extra: run the auto-generated example module!
cd tests/integration # go to the integration test... pixi run snakemake --use-conda # run it!
- Standardised layout compliant with the snakemake workflow catalogue's listing requirements, enabling them to be automatically included in their listings once published. Read more about those requirements here.
- Standardised input/output structure across modules:
resources/: files needed for the module's processes.user/: files that should be provided by users. Document them well!automatic/: files that the module downloads or prepares in intermediate steps.
results/: files generated by the module's algorithms that are relevant to the user.
- Pre-made integration setup for your module.
- Continuous Integration (CI) settings, ready for pre-commit.ci.
- GitHub actions to automate chores during pull requests and releases.
- Premade
pytestsetup.
Important
A few things to be aware of.
- Modules do not work like regular snakemake workflows
- The primary way to test them should be external (calling
module:, passing resources, and requesting results). Check the pre-made example intests/integrationfor more info. - Internal access (e.g., calling the
all:rule) may not work, as the module may not have the necessaryresources/to execute properly.
- The primary way to test them should be external (calling
- Please be sure to maintain the following files to ensure
cliocompatibility- These are:
INTERFACE.yaml: a simple description of the module's input/output structure.config/config.yaml: a basic functioning example of how to configure this module.workflow/internal/config.schema.yaml: the module's configuration schema, used bysnakemakefor validation.AUTHORS/CITATION.cff/LICENSE: licensing and attribution of this module's code and methods.
- These are: