Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Nextclade metadata merge to use augur curate rename and augur merge #52

Merged
merged 3 commits into from
Sep 19, 2024

Commits on Sep 4, 2024

  1. ingest: Use raw-string for Nextclade metadata merge rule

    Preserving the line-breaks makes the command much more readable in
    Snakemake output¹, which is important since I'm changing this rule right
    now.
    
    The \n previously interpreted by Python is now interpreted by `tr`,
    which is preferable.
    
    ¹ <https://docs.nextstrain.org/en/latest/reference/snakemake-style-guide.html#use-triple-quoted-command-definitions>
    tsibley committed Sep 4, 2024
    Configuration menu
    Copy the full SHA
    762acdb View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2024

  1. ingest: Rename Nextclade metadata fields with augur curate rename

    This construction reads much clearer and cleaner.
    
    Moves the Nextclade field map directly and more conveniently into the
    YAML config instead of referencing a separate TSV file.  Putting the
    field map into a separate file seemed to be only for the sake of the
    --kv-file (-k) interface provided by `cvstk rename2`, which we're no
    longer using here.  For backwards compatibility, configs that reference
    a TSV file are still supported and will be handled on-the-fly.
    
    Note that `augur curate` commands currently emit CSV-like TSVs that are
    limited to be IANA-like¹ such that parsing them with tsv-utils is most
    appropriate, hence the switch from `csvtk cut` to `tsv-select`.
    
    ¹ See <nextstrain/augur#1566>.
    tsibley committed Sep 10, 2024
    Configuration menu
    Copy the full SHA
    faebd64 View commit details
    Browse the repository at this point in the history
  2. ingest: Merge Nextclade metadata with augur merge

    This construction reads a bit clearer and cleaner.  It's also a good
    example of how to use `augur merge`.
    
    The limitation on non-seekable streams means the workflow now uses
    additional transient disk space, but it typically shouldn't be an issue.
    The way Augur's slow start up time impacts `augur merge` also
    contributes to a longer rule execution time, but it should be negligible
    in the context of the larger workflow and presumably we'll fix the slow
    start up eventually.¹
    
    The output is semantically identical but has some syntactic changes re:
    quoting.  It's worth noting that the pre-existing TSV format was _not_
    IANA TSV, despite it (still) being treated as such in a few places, but
    was (and remains) a CSV-like TSV with some quoted fields (and some
    mangled quotes², e.g. the "institution" column for accession KJ556895).
    We really need to sort out our TSV formats³, but that's for a larger
    project.
    
    ¹ <nextstrain/augur#1628>
    ² <nextstrain/augur#1565>
    ³ <nextstrain/augur#1566>
    tsibley committed Sep 10, 2024
    Configuration menu
    Copy the full SHA
    4d73b7f View commit details
    Browse the repository at this point in the history