GNU General Public License, GPLv3
This package provides functions for format conversion from bgen files to SeqArray GDS files.
v0.9.0
Dr. Xiuwen Zheng (zhengxwen@gmail.com)
Requires R (≥ v3.5.0), gdsfmt (≥ v1.20.0), SeqArray (≥ v1.24.0)
- Installation from Github:
library("devtools")
install_github("zhengxwen/gds2bgen")
The install_github()
approach requires that you build from source, i.e. make
and compilers must be installed on your system -- see the R FAQ for your operating system; you may also need to install dependencies manually.
Or manually intall the package
git clone https://github.com/zhengxwen/gds2bgen
cd gds2bgen/src
tar -vxzf gavinband-bgen-0b7a2803adb5.tar.gz
cd gavinband-bgen-0b7a2803adb5
./waf configure
./waf
cp build/libbgen.a ..
cp build/3rd_party/zstd-1.1.0/libzstd.a ..
rm -rf build
cd ../../..
R CMD INSTALL gds2bgen
This package includes the sources of the bgen library written by Gavin Band and Jonathan Marchini (https://bitbucket.org/gavinband/bgen), Boost (the C++ libraries, https://www.boost.org) and Zstandard (https://zstd.net).
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. DOI: 10.1093/bioinformatics/bts606.
Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D (2017). SeqArray -- A storage-efficient high-performance data format for WGS variant calls. Bioinformatics. DOI: 10.1093/bioinformatics/btx145.
library(gds2bgen)
bgen_fn <- system.file("extdata", "example.8bits.bgen", package="gds2bgen")
# or bgen_fn <- "your_bgen_file.bgen"
seqBGEN_Info(bgen_fn)
## bgen file: gds2bgen/extdata/example.8bits.bgen
## # of samples: 500
## # of variants: 199
## compression method: zlib
## layout version: v1.2
## sample id: sample_001, sample_002, sample_003, sample_004, ...
# example.8bits.bgen ==> example.gds, using 4 cores
seqBGEN2GDS(bgen_fn, "example.gds",
storage.option="LZMA_RA", # compression option, e.g., ZIP_RA for zlib or LZ4_RA for LZ4
float.type="packed8", # 8-bit packed real numbers
geno=FALSE, # 2-bit integer genotypes, stored in 'genotype/data'
dosage=TRUE, # numeric alternative allele dosages, stored in 'annotation/format/DS'
prob=FALSE, # numeric genotype probabilities, stored in 'annotation/format/GP'
parallel=4 # the number of cores
)
# show file structure
library(SeqArray)
(f <- seqOpen("example.gds"))
seqClose(f)
## File: example.gds (137.7K)
## + [ ] *
## |--+ description [ ] *
## |--+ sample.id { Str8 500 LZMA_ra(7.02%), 393B } *
## |--+ variant.id { Int32 199 LZMA_ra(33.9%), 277B } *
## |--+ position { Int32 199 LZMA_ra(60.6%), 489B } *
## |--+ chromosome { Str8 199 LZMA_ra(15.7%), 101B } *
## |--+ allele { Str8 199 LZMA_ra(11.8%), 101B } *
## |--+ genotype [ ] *
## | |--+ data { Bit2 2x500x0 LZMA_ra, 18B } *
## | |--+ extra.index { Int32 3x0 LZMA_ra, 18B } *
## | \--+ extra { Int16 0 LZMA_ra, 18B }
## |--+ phase [ ]
## | |--+ data { Bit1 500x0 LZMA_ra, 18B } *
## | |--+ extra.index { Int32 3x0 LZMA_ra, 18B } *
## | \--+ extra { Bit1 0 LZMA_ra, 18B }
## |--+ annotation [ ]
## | |--+ id { Str8 199 LZMA_ra(18.6%), 321B } *
## | |--+ qual { Float32 199 LZMA_ra(11.8%), 101B } *
## | |--+ filter { Int32 199 LZMA_ra(11.3%), 97B } *
## | |--+ info [ ]
## | \--+ format [ ]
## | |--+ DS [ ] *
## | | \--+ data { PackedReal8U 500x199 LZMA_ra(55.6%), 54.0K } *
## | \--+ GP [ ] *
## | \--+ data { PackedReal8U 500x398 LZMA_ra(38.8%), 75.3K } *
## \--+ sample.annotation [ ]
seqVCF2GDS() in the SeqArray package, conversion from VCF files to GDS files.
seqBED2GDS() in the SeqArray package, conversion from PLINK BED files to GDS files.