Skip to content

CDS phase (frame offset for eg ribo slippage) not taken into account in amino acid translation #732

Open
@davmlaw

Description

The GFF format has a "phase" column on CDS features (values 0,1,2) which alter the reading frame of exons and the translation to amino acids.

The UTA/DataProvider transcript annotation format does not currently contain this information, so I believe it will need to be added, then HGVS code modified to take it into account when converting to p. (similar to how alignment gaps are done between g. and c.)

Example annotation

from ref_GRCh37.p10_top_level.gff3 (phase is the "1" after the "+"):

NC_000007.13	RefSeq  CDS 	94292646    	94293825    	.   	+   	1   	ID=cds13063;Name=NP_001165908.1;Parent=rna16954;Note=isoform 3 is encoded by transcript variant 2;Dbxref=GeneID:23089,Genbank:NP_001165908.1,HGNC:14005,MIM:609810;exception=ribosomal slippage;gbkey=CDS;product=retrotransposon-derived protein PEG10 isoform 3;protein_id=NP_001165908.1

Column 8: "phase"

For features of type "CDS", the phase indicates where the next codon begins relative to the 5' end (where the 5' end of the CDS is relative to the strand of the CDS feature) of the current CDS feature. For clarification the 5' end for CDS features on the plus strand is the feature's start and and the 5' end for CDS features on the minus strand is the feature's end. The phase is one of the integers 0, 1, or 2, indicating the number of bases forward from the start of the current CDS feature the next codon begins. A phase of "0" indicates that a codon begins on the first nucleotide of the CDS feature (i.e. 0 bases forward), a phase of "1" indicates that the codon begins at the second nucleotide of this CDS feature and a phase of "2" indicates that the codon begins at the third nucleotide of this region. Note that ‘Phase’ in the context of a GFF3 CDS feature should not be confused with the similar concept of frame that is also a common concept in bioinformatics. Frame is generally calculated as a value for a given base relative to the start of the complete open reading frame (ORF) or the codon (e.g. modulo 3) while CDS phase describes the start of the next codon relative to a given CDS feature.

The phase is REQUIRED for all CDS features.

This was originally raised by holtgrewe on cdot project

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions