Skip to content

Duplicate exons, overlapping CDSs in output #26

Open
@nikostr

Description

One piece of my output looks as follows:

ptg000006l	LiftOn	gene	86505371	86507566	.	-	.	ID=g31624;source=Liftoff
ptg000006l	LiftOn	transcript	86505371	86507566	.	-	.	ID=g31624.t1;Parent=g31624;mutation=frameshift,stop_codon_gain;protein_identity=0.718;dna_identity=0.652;status=LiftOn_chaining_algorithm
ptg000006l	LiftOn	exon	86505371	86505373	.	-	.	ID=exon_138261;Parent=g31624.t1
ptg000006l	LiftOn	exon	86505371	86505373	.	-	.	ID=exon_138261;Parent=g31624.t1
ptg000006l	LiftOn	exon	86506533	86507566	.	-	.	ID=exon_138262;Parent=g31624.t1
ptg000006l	LiftOn	exon	86506650	86507566	.	-	.	ID=exon_138263;Parent=g31624.t1
ptg000006l	LiftOn	CDS	86507413	86507566	1636	-	1	Parent=g31624.t1
ptg000006l	LiftOn	CDS	86506650	86507566	.	-	0	Parent=g31624.t1

This does not look right (duplicate exons, overlapping CDSs), and it causes gffread to crash when attempting to extract protein sequences. I ran the following command:

lifton \
    -g $GFF \
    -o sample.lifton.gff3 \
    -copies \
    --threads 10 \
    $TARGET \
    $REF

using LiftOn v1.0.5. Unfortunately I can't currently share my input data.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions