Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiles: rename defaults.yaml to profiles_config.yaml #24

Closed
wants to merge 1 commit into from

Conversation

joverlee521
Copy link
Contributor

Description of proposed changes

Reduce confusion of the use of the profiles' config files. They are to be seen as additional profiles' config on top of the default configs. Stems from discussion with @j23414 for review of
nextstrain/dengue#15.

Note we not using config.yaml because this is used by the Snakemake profiles configuration.¹

¹ https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles

Reduce confusion of the use of the profiles' config files. They are to
be seen as additional profiles' config on top of the default configs.
Stems from discussion with @j23414 for review of
nextstrain/dengue#15.

Note we not using `config.yaml` because this is used by the Snakemake
profiles configuration.¹

¹ https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles
@joverlee521 joverlee521 requested a review from a team January 5, 2024 00:01
Copy link

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for wading into this area without more prior involvement, but I find our general use of the term "profiles" throughout workflows to be confusing. I would only use that term to refer to Snakemake's profile functionality. Profiles determine how a workflow will be executed, while workflow configuration files (--configfile) determine what a workflow will do. Adding the "profiles" prefix to a build configuration file here inside of a profiles directory appears to reinforce the idea that the config is related to Snakemake profiles.

For this specific PR, what if we used a name like builds_defaults.yaml that parallels our builds.yaml files in seasonal-flu and ncov workflows?

More generally, I wish we decoupled our profile configuration files (e.g., profiles/slurm-rhino/config.yaml that defines how we run Snakemake from the Hutch's SLURM cluster) from our build configuration files (e.g., builds/nextstrain-gisaid.yaml that defines the configuration of a ncov build based on GISAID data), since they are such different concepts. We could continue to use subdirectories for build configurations that include multiple config files like custom rules (e.g., builds/nextstrain-gisaid/custom.smk). This decoupling allows one to specify what to run with --configfile and how to run it with --profile.

@joverlee521
Copy link
Contributor Author

Thanks for the feedback @huddlej! I fully agree the term "profiles" can be confusing. I'm hesitant to use the term "builds" because it's so overloaded, especially in the Nextstrain ecosystem.

Maybe just "customizations" is descriptive enough?
Using the phylogenetic workflow as an example:

phylogenetic/
├── README.md
├── Snakefile
├── config
│   └── defaults.yaml
├── customizations
│   └── ci
│       ├── config.yaml
│       └── copy_example_data.smk
├── example_data
│   ├── metadata.tsv
│   └── sequences.fasta
└── rules
    ├── annotate_phylogeny.smk
    ├── construct_phylogeny.smk
    ├── export.smk
    └── prepare_sequences.smk

@tsibley
Copy link
Member

tsibley commented Jan 10, 2024

@joverlee521 Why separate config/ and customizations/ in that suggestion? (or in the PR as-is, separate config/ and profiles/)

I'd think this (or something similar) would be just fine?

├── config
│   └── defaults.yaml
│   └── ci
│       ├── config.yaml
│       └── copy_example_data.smk

@joverlee521
Copy link
Contributor Author

Why separate config/ and customizations/ in that suggestion?

I proposed to keep them separate for two reasons:

  1. I'd like to keep config/ simple with just the default configs and not bloat it with customizations.
  2. Customizations can be a mix of configs and custom rules, so I think it may be confusing to keep them under config/.

(1) is based on experience with the seasonal flu repo, which has a pretty full config dir with sub-directories for the subtypes. It might be confusing to see the current profiles mixed in:

config
├── allflu
├── ci
├── distance_maps
├── europe
├── example
├── gisaid
├── h1n1pdm
├── h3n2
├── hutch
├── nextflu-private
├── nextflu-private-forecasts
├── nextflu-public
├── nextstrain
├── nextstrain-public
├── private.nextflu.org
├── scicore
├── upload
├── vic
└── yam

We could keep them under config/customizations/ for clearer separation. My concern for (2) still stands in this case, but maybe I'm overthinking it...

config/
├── customizations
│   ├── allflu
│   ├── ci
│   ├── europe
│   ├── example
│   ├── gisaid
│   ├── hutch
│   ├── nextflu-private
│   ├── nextflu-private-forecasts
│   ├── nextflu-public
│   ├── nextstrain
│   ├── nextstrain-public
│   ├── private.nextflu.org
│   ├── scicore
│   └── upload
├── distance_maps
├── h1n1pdm
├── h3n2
├── vic
└── yam

@tsibley
Copy link
Member

tsibley commented Jan 10, 2024

Oh I see. Yeah, that makes sense.

j23414 added a commit to nextstrain/zika that referenced this pull request Jan 10, 2024
Use the stopgap from mpox (nextstrain/mpox#214) until the pathogen-repo-ci is updated.
However, we may be moving to using config/customization/ci in the future, so revisit
this commit and edit accordingly.

nextstrain/pathogen-repo-guide#24
j23414 added a commit to nextstrain/zika that referenced this pull request Jan 10, 2024
Use the stopgap from mpox (nextstrain/mpox#214) until the pathogen-repo-ci is updated.
However, we may be moving to using config/customization/ci in the future, so revisit
this commit and edit accordingly.

nextstrain/pathogen-repo-guide#24
@jameshadfield
Copy link
Member

jameshadfield commented Jan 11, 2024

Do we have a working definition of "config" vs "customisation" vs (snakemake?) "profiles" to help me interpret this? (And maybe add "defaults" to this too.)

If we didn't use snakemake profiles for the seasonal flu would the files in profiles/ be additional directories within config/?

The linked seasonal flu repo is certainly complex in the sense of lots of directories, and perhaps necessarily so, but it'd be great to avoid introducing this complexity into repos which don't need it.

More generally, I wish we decoupled our profile configuration files ... from our build configuration files ... since they are such different concepts.

+1

@joverlee521
Copy link
Contributor Author

Do we have a working definition of "config" vs "customisation" vs (snakemake?) "profiles" to help me interpret this? (And maybe add "defaults" to this too.)

Definitely missing explicit definitions, my interpretations are:

In my definition, I am basically merging config and defaults, where anything under config/ should be considered the default configs used for a workflow. This makes it very easy for users to see what the default configs are for the workflow at first glance.

If we didn't use snakemake profiles for the seasonal flu would the files in profiles/ be additional directories within config/?

I think I would still keep it separate for the sake of keeping the default configs very simple and obvious.

@joverlee521
Copy link
Contributor Author

Closing in favor of #27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants