File tree 1 file changed +4
-3
lines changed
1 file changed +4
-3
lines changed Original file line number Diff line number Diff line change @@ -39,9 +39,10 @@ \section{Contact:} hengli@broadinstitute.org
39
39
\section {Introduction }
40
40
41
41
VCF/BCF~\citep {Danecek:2011qy } is the primary format for storing and analyzing
42
- genotypes of multiple samples. It however has a few issues. Firstly, as a
43
- streaming format, VCF compresses all types of information together. Retrieving
44
- site annotations or the genotypes of a few samples usually requires to decode
42
+ genotypes of multiple samples. It however has a few issues. Firstly, VCF
43
+ is a site-oriented format. While accessing a site and all the associated
44
+ genotypes is efficient with indexing, retrieving
45
+ site annotations or the genotypes of a few samples always requires to decode
45
46
the genotypes of all samples, which is unnecessarily expensive. Secondly, VCF
46
47
does not take advantage of linkage disequilibrium (LD), while using this
47
48
information can dramatically improve compression ratio~\citep {Durbin:2014yq }.
You can’t perform that action at this time.
0 commit comments