-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathphaseImpute.Rd
147 lines (129 loc) · 5.07 KB
/
phaseImpute.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/phasingImpute.R
\name{phaseImpute}
\alias{phaseImpute}
\title{Phasing and imputation}
\usage{
phaseImpute(
inputPrefix,
outputPrefix,
autosome = TRUE,
plink,
shapeit,
imputeTool,
impute,
qctool,
gtool,
windowSize = 3e+06,
effectiveSize = 20000,
nCore = 40,
threshold = 0.9,
outputInfoFile,
SNP = TRUE,
referencePanel,
impRefDIR,
tmpImputeDir,
keepTmpDir = TRUE
)
}
\arguments{
\item{inputPrefix}{the prefix of the input PLINK binary files for
the imputation.}
\item{outputPrefix}{the prefix of the output PLINK binary files
after imputation.}
\item{autosome}{a logical value indicating if only autosomal chromosomes
are imputed.}
\item{plink}{an executable program in either the current
working directory or somewhere in the command path.}
\item{shapeit}{an executable program in either the current
working directory or somewhere in the command path.}
\item{imputeTool}{a string indicating the type of imputation tool is used:
"impute2" or "impute4".}
\item{impute}{an executable program in either the current
working directory or somewhere in the command path. It can be either
"impute2" or "impute4".}
\item{qctool}{an executable program in either the current working
directory or somewhere in the command path. This is only used if imputeTool
is "impute4".}
\item{gtool}{an executable program in either the current
working directory or somewhere in the command path.}
\item{windowSize}{the window size of each chunk.
The default value is 3000000.}
\item{effectiveSize}{this parameter controls the effective population size.
Commonly denoted as Ne. A universal -Ne value of 20000 is suggested.}
\item{nCore}{the number of cores used for splitting chromosome by PLINK,
phasing, imputation, genotype format modification, genotype conversion, and
merging genotypes. The default value is 40.}
\item{threshold}{threshold for merging genotypes from GEN probability.
Default 0.9.}
\item{outputInfoFile}{the output file of impute2 info scores consisting of
two columns: all imputed SNPs and their info scores.}
\item{SNP}{A logical value indicating if the data is entirely comprised
single nucleotide polymorphisms then it can be set as TRUE and the genotypes
are expressed as pairs of A,C,G,T and unknowns are represented as N N.}
\item{referencePanel}{a string indicating the type of imputation
reference panels is used: "1000Gphase1v3_macGT1" or "1000Gphase3".}
\item{impRefDIR}{the directory where the imputation reference files
are located.}
\item{tmpImputeDir}{the name of the temporary directory used for
storing phasing and imputation results.}
\item{keepTmpDir}{a logical value indicating if the directory
'tmpImputeDir' should be kept or not. The default is TRUE.}
}
\value{
Note that chromosome X is not supported for the impute4.
1.) The filtered imputed PLINK binary files;
2.) The final PLINK binary files including bad imputed variants;
3.) A pure text file contains the info scores of all imputed SNPs with
two columns: SNP names and the corresponding info scores.
}
\description{
Perform phasing, imputation and conversion from IMPUTE2 or GEN format into
PLINK binary files.
}
\details{
The whole imputation process mainly consists of the following
steps:
1.) Phasing the input PLINK data using an existing imputation reference;
2.) Imputing the input PLINK data using phased results and an existing
reference data;
3.) Converting IMPUTE2 or GEN format data into PLINK format.
4.) Combining all imputed data into whole-genome PLINK binary files.
5.) Filtering out imputed variants with bad imputation quality.
Parallel computing in R is supported.
}
\examples{
## In the current working directory
bedFile <- system.file("extdata", "alignedData.bed", package="Gimpute")
bimFile <- system.file("extdata", "alignedData.bim", package="Gimpute")
famFile <- system.file("extdata", "alignedData.fam", package="Gimpute")
system(paste0("scp ", bedFile, " ."))
system(paste0("scp ", bimFile, " ."))
system(paste0("scp ", famFile, " ."))
inputPrefix <- "alignedData"
outputPrefix <- "gwasImputed"
outputInfoFile <- "infoScore.txt"
tmpImputeDir <- "tmpImpute"
## Not run: Requires an executable program PLINK, e.g.
## plink <- "/home/tools/plink"
## phaseImpute(inputPrefix, outputPrefix, autosome=TRUE,
## plink, shapeit, imputeTool, impute, qctool, gtool,
## windowSize=3000000, effectiveSize=20000,
## nCore=40, threshold=0.9, outputInfoFile, SNP=TRUE,
## referencePanel, impRefDIR, tmpImputeDir, keepTmpDir=TRUE)
}
\references{
\enumerate{
\item Howie, B., et al. (2012). Fast and accurate genotype imputation
in genome-wide association studies through pre-phasing. Nat Genet
44(8): 955-959.
\item Howie, B. N., et al. (2009). A flexible and accurate genotype
imputation method for the next generation of genome-wide association
studies. PLoS Genet 5(6): e1000529.
\item Bycroft, C., et al. Genome-wide genetic data on~ 500,000 UK Biobank
participants. BioRxiv (2017): 166298.
}
}
\author{
Junfang Chen
}