Skip to content

Mutation text inconsistent amino acid ordering (ex: T215YS vs T215SY) #125

@cgbielick

Description

@cgbielick

Hi, I was just doing some benchmark comparisons to the HIVdb and found some mismatches. Seems like Stanford API alphabetizes the mutations whereas sierra-local outputs mutation text in encounter order.

For example (in the "text" field)
// sierra-local
{"position": 215, "AAs": "SY", "text": "T215YS"}

// Stanford API
{"position": 215, "AAs": "SY", "text": "T215SY"}

The AAs field is alphabetical in both (comes from the aligner), but the text field differs.

I think this is because in nucaminohook.py lines 479-483, the code uses two different sources:

  • mut['AminoAcidText'] → from aligner (alphabetical) → becomes the AAs field
  • translate_na_triplet(codon) → re-translates locally (encounter order) → becomes the text field

The translate_na_triplet function joins amino acids as they're encountered during codon enumeration rather than sorting them alphabetically. A simple fix could be to just use the aligner's output for both fields instead of re-translating.

gene_muts.update(
{position - left: (mut['ReferenceText'],
mut['AminoAcidText'],
mut['AminoAcidText'] # Instead of translate_na_triplet(codon)
)}
)

According to https://hivdb.stanford.edu/_wrapper/pages/documentPage/user_guide.pdf, "The order of the mutations is not relevant," so it should still be meaningfully the same. But for consistency on other packages platforming off this then alphabetical ordering as Stanford's convention would make things easier all around.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions