Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gene coverage columns during ingest workflow #36

Merged
merged 15 commits into from
Apr 24, 2024

Commits on Apr 16, 2024

  1. Configuration menu
    Copy the full SHA
    9839bd1 View commit details
    Browse the repository at this point in the history
  2. Add genome_coverage and indicator (True/blank) variable for E_coverage

    This is using the Nextclade "coverage" as "genome_coverage" and the Nextclade "failedCdses"
    to check if E_coverage is present or not.
    
    fixup: use 1 instead of true
    j23414 committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    2a9eee4 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    12947fa View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    d22016f View commit details
    Browse the repository at this point in the history
  5. Output Nextclade gene translations to a fasta files

    This can be one gene or a set of genes, can then be used to calculate gene_coverage columns.
    j23414 committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    ddf0fb3 View commit details
    Browse the repository at this point in the history
  6. Only have final files in the "results" directory

    Move intermediate files to the "data" folder
    j23414 committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    1a6d1ef View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    35bbb83 View commit details
    Browse the repository at this point in the history
  8. Add rules for gene_coverage

    Adds the following rules for gene coverage
    
    * calculate_gene_coverage: calls a python script which takes a Nextclade CDS translation FASTA and calculates (valid AA)/(total length). The percentage is rounded to 3 significant figures.
    * aggregate_gene_coverage_by_gene: combines the gene_coverage files by gene (e.g. ["E", "NS1"] ) across all serotypes (e.g. denv1-4)
    * appends_gene_coverage_columns: Add each gene_coverage column (e.g. "E_coverage", "NS1_coverage") to the the final metadata
    j23414 committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    e0d2a77 View commit details
    Browse the repository at this point in the history
  9. fixup: Use tsv-append instead

    Co-authored-by: Jover Lee <joverlee521@gmail.com>
    j23414 and joverlee521 committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    685e218 View commit details
    Browse the repository at this point in the history
  10. fixup: drop the E_indicator column

    #36 (comment)
    
    Since we are not using the E_indicator column, drop it.
    We have separate steps to calculate the E_coverage column.
    j23414 committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    c861942 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    e722c76 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    7b94670 View commit details
    Browse the repository at this point in the history
  13. fixup: move hard-coded columns to a shared workflow variable or confi…

    …g params so they don't get out of sync between rules
    j23414 committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    1e7cde8 View commit details
    Browse the repository at this point in the history
  14. Use serotype/gene/files in directory structure

    Encode serotype and gene as part of the directory structure where possible.
    j23414 committed Apr 16, 2024
    Configuration menu
    Copy the full SHA
    f6a620d View commit details
    Browse the repository at this point in the history

Commits on Apr 19, 2024

  1. Use a one-to-one mapping of Nextclade input to output columns

    As suggested by #36 (comment)
    Merge ID should be the first item in the map
    j23414 committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    db300a5 View commit details
    Browse the repository at this point in the history