Description
Hello!
It is not clear from specification whether the GFF3 file should be sorted by seqid
or not if multiple seqid
present in a file.
I received a file where it is not the case, e.g. first there are lines of type gene
for multiple seqid
s and then multiple nRNA lines with the same set of seqid
s and with parents of the genes described above.
The reader I use (Sci-Kit Bio read
function) reads each occurrence of seqid
as new name. If specific sequence ID it provided, it reads only the first record (I presume because it encounters different seqid
after that).
So, my problem is that because it is not specified, I cannot understand is it reader's behaviour incorrect or it is being strict and correct and the file itself is formatted incorrectly?
Thank you very much for clarification.