Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update phylo workflow #3

Merged
merged 3 commits into from
May 16, 2024
Merged

Update phylo workflow #3

merged 3 commits into from
May 16, 2024

Commits on May 14, 2024

  1. Convert nextalign run to nextclade run #2

    * `output.insertions` will be a TSV file now
    * `--reference` is now spelled `--input-ref`
    * `--genemap` is now spelled `--input-annotation`
    * `--retry-reverse-complement` is no longer supported
    * `--output-insertions` is now spelled `--output-tsv`
    
    Note: dropping `--retry-reverse-complement` is the one that I am most
    unsure about, but this version completes this step.
    genehack committed May 14, 2024
    Configuration menu
    Copy the full SHA
    e7fa734 View commit details
    Browse the repository at this point in the history
  2. change gene=* to gene_name=* in hku1/genemap.gff #2

    Initially, the workflow failed with the following error:
    
    ```
    Error:
       0: When reading genome annotation
       1: When reading file: "config/hku1/genemap.gff"
       2: Attempted to parse the genome annotation as JSON and as GFF, but both attempts failed:
          JSON error: invalid type: string "NC_006577.2\tfeature\tsource\t1\t29926\t.\t+\t.\tgene=nuc NC_006577.2\tfeature\tgene\t206\t13600
    \t.\t+\t.\tgene=ORF1a NC_006577.2\tfeature\tgene\t13600\t21753\t.\t+\t.\tgene=ORF1b NC_006577.2\tfeature\tgene\t21773\t22933\t.\t+\t.\tg
    ene=HE NC_006577.2\tfeature\tgene\t22942\t27012\t.\t+\t.\tgene=Spike NC_006577.2\tfeature\tgene\t22978\t25221\t.\t+\t.\tgene=S1 NC_00657
    7.2\tfeature\tgene\t27051\t27380\t.\t+\t.\tgene=S2 NC_006577.2\tfeature\tgene\t27051\t27380\t.\t+\t.\tgene=ORF4 NC_006577.2\tfeature\tge
    ne\t27373\t27621\t.\t+\t.\tgene=E NC_006577.2\tfeature\tgene\t27633\t28304\t.\t+\t.\tgene=M NC_006577.2\tfeature\tgene\t28320\t29645\t.\
    t+\t.\tgene=N NC_006577.2\tfeature\tgene\t28342\t28959\t.\t+\t.\tgene=N2", expected struct GeneMap at line 2 column 1
    
          GFF3 error: When processing gene, 'N': When processing feature group 'N' ('N') of type 'gene': genes must consist of exactly one f
    eature: Expected exactly one element, but found: 2
       2:
    
    Location:
       /workdir/packages/nextclade/src/gene/gene_map.rs:56
    ```
    
    While looking at the referenced file, and comparing it to the other
    `genemap.gff` files in the config, I noticed that all the others used
    `gene_name` for everything after the first `gene` line. I changed this
    file to match, and the workflow got past the point where it was
    previously erroring out.
    
    I have no idea why this worked; hopefully somebody will explain in the
    code review.
    genehack committed May 14, 2024
    Configuration menu
    Copy the full SHA
    4eeffd8 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2909a03 View commit details
    Browse the repository at this point in the history