You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 11, 2021. It is now read-only.
I think I understand that "position sorted" is the same as "sequence name and then by leftmost coordinate", as the sequence is considered a part of the entry's position. Still, since the information exists in two separate columns, it might be beneficial to state this explicitly in the documentation as to avoid any mistakes by users.
Position sorted means sorted by chromosome and then sorted by position within each chromosome. The position entry in the VCF specification only alludes to that. I agree that the documentation is unclear. This is probably because "position sorted" has become a de-facto technical term in bioinformatics, but of course that's not very helpful to novices.
Also, some tools such as the GATK even distinguish between different chromosome orderings, and require them to match across all input files.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
On http://www.htslib.org/doc/tabix.html it is indicated that the file should be position sorted.
However in many usages I see that the files are in fact first sorted by seqname THEN position. The tabix paper also seems to indicate this https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3042176/.
So does the documentation need to be updates, or has tabix been updated since to allow the seqname to be out of order?
The text was updated successfully, but these errors were encountered: