[phylo github actions] add summary message #75

jameshadfield · 2024-07-16T21:38:58Z

The pathogen-repo-build reusable action adds an extremely helpful summary describing the AWS run. This adds some similarly helpful info which should make it much simpler to check the results of a phylo run.

Trial run(s):

NCBI Note the 2nd URL is missing, this has now been fixed

This abstracts out the configuration into two separate YAML files. As a result the snakemake complexity is reduced and (hopefully) the main interaction point can now be the YAMLs themselves. There are a few changes to the behaviour of the h5n1-cattle-outbreak builds: * For individual segment builds we now use the h5n1-cattle-outbreak dropped list (previously we used the H5N1 drop list) * For individual segment builds we now use the H5NX input data (previously we used the H5N1 input data) * We no longer remove sequences via a clock filter There are no changes to the behaviour of the GISAID builds. The config is generally straightforward except where parameters differ for a genome build vs the corresponding segment builds. To avoid having to list the same parameters out 8 times, I implemented (e.g.) `config.traits.genome_columns` and `config.traits.columns`. This is only observed in the h5n1-cattle-outbreak config.

The rules in `common.smk` were separated out to reduce rule duplication between the main Snakefile and Snakefile.genome. The latter has since been integrated into the main snakefile, and so we do the same with these "common" rules.

This results in disjoint sets of filenames for the GISAID builds (config/gisaid.yaml) and the NCBI builds (config/h5n1-cattle-outbreak.yaml), which therefore allows you to run each set of builds locally without one interfering with the other. In addition, the way local-ingest data can be used is streamlined so that you can achieve the same outcome with local data. Note that if you run (e.g.) GISAID builds using local data then run them with S3 data all the intermediate files will be regenerated. In other words you cannot maintain parallel "versions" of these simultaneously.

Makes listing / looking at the results files a more pleasant experience There should be no changes to behaviour with this commit.

The pipeline already adds this field to the metadata TSV in-use, but it won't be exported without this addition to the auspice-config JSON Note that the clade definitions haven't been regenerated for NCBI data so there's actually no clades defined at the moment, and thus nothing is exported.

to reflect the changes made in the previous few commits. The addition of "genome" to the h5n1-cattle-outbreak config YAML is needed to make it an explicit output of the `all` rule, and this output is what's used by the `deploy_all` rule

The `pathogen-repo-build` reusable action adds an extremely helpful summary describing the AWS run. This adds some similarly helpful info which should make it much simpler to check the results of a phylo run.

jameshadfield · 2024-07-16T23:24:44Z

Cherry-picked into #72 - thanks for taking a look @joverlee521!

jameshadfield added 7 commits July 15, 2024 15:27

Integrate common.smk into Snakefile

3dca02f

The rules in `common.smk` were separated out to reduce rule duplication between the main Snakefile and Snakefile.genome. The latter has since been integrated into the main snakefile, and so we do the same with these "common" rules.

Organise results into directories

ca2ebec

Makes listing / looking at the results files a more pleasant experience There should be no changes to behaviour with this commit.

Update GitHub Actions invocations

736b114

to reflect the changes made in the previous few commits. The addition of "genome" to the h5n1-cattle-outbreak config YAML is needed to make it an explicit output of the `all` rule, and this output is what's used by the `deploy_all` rule

[phylo github actions] add summary message

5f67e06

The `pathogen-repo-build` reusable action adds an extremely helpful summary describing the AWS run. This adds some similarly helpful info which should make it much simpler to check the results of a phylo run.

jameshadfield requested a review from joverlee521 July 16, 2024 21:40

joverlee521 approved these changes Jul 16, 2024

View reviewed changes

jameshadfield force-pushed the james/snakemake-simplifications branch from 736b114 to 6b9bc7c Compare July 16, 2024 23:24

jameshadfield closed this Jul 16, 2024

jameshadfield deleted the james/improved-gha-summaries branch July 17, 2024 02:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[phylo github actions] add summary message #75

[phylo github actions] add summary message #75

jameshadfield commented Jul 16, 2024 •

edited

Loading

jameshadfield commented Jul 16, 2024

[phylo github actions] add summary message #75

[phylo github actions] add summary message #75

Conversation

jameshadfield commented Jul 16, 2024 • edited Loading

jameshadfield commented Jul 16, 2024

jameshadfield commented Jul 16, 2024 •

edited

Loading