-
Notifications
You must be signed in to change notification settings - Fork 64
/
README.QpWave
128 lines (95 loc) · 4.88 KB
/
README.QpWave
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
DOCUMENTATION OF QpWavs and qpAdm:
See also .examples/pdoc.pdf
qpWave:
qpWave requires that the input data is available in a Recilab format such as EIGENSTRAT.
To convert to the appropriate format, one can use CONVERTF. See
README.CONVERTF for documentation of programs for converting file formats.
Executable and source code:
------------------------------------------------------------------------------
For information about installing the program, see README.ADMIXTOOLS.
After installing the programs, the executable for qpWave should be located in the bin directory.
To run qpWave, type the following on a linux machine.
$DIR/bin/qpWave -p parfile >logfile
$DIR: Path to the bin directory.
logfile: Name of the logfile. The logfile contains the output of the run.
parfile: Name of parameter file
qpWave gives evidence of the number of admixture flows between the left and right populations,
and should usually be run as a precursor to qpAdm (see below).
DESCRIPTION OF EACH PARAMETER in parfile:
genotypename: input genotype file (in eigenstrat or packedancestrymap r format)
snpname: input snp file (in eigenstrat format)
indivname: input indiv file (in eigenstrat format)
popleft: left population list (1 per line)
popright: right population list (1 per line)
details: YES
allsnps: YES
## default NO
When set to YES, the maximum number of snps available for each f4 statistic are used. By default,
only the intersect of snps across all Left and Right populations are used.
*** NEW ***
If allsnps: YES qpfstats is called to calculate the relevant f-statistics. If you want
compatibility with older versions (VERSION < 1400) code:
allsnps: YES
inbreed: YES
oldallsnpsmode: NO
This is deprecated as the algorithm is inferior.
*** stongly recommended that inbreed be set. YES if possible.
The same feature is now also in qpAdm
chrom: Only use snps in the specified chromosome.
DESCRIPTION OF OUTPUT FILE:
The program will write all the output to stdout. The output file prints the parfile entered by the user,
number of snps and individuals, jackknife block size, number of blocks for jackknife and the results.
See example shown in-
examples/qpWave.log
qpAdm:
To run qpAdm, type the following on a linux machine.
$DIR/bin/qpAdm -p parfile >logfile
$DIR: Path to the bin directory.
logfile: Name of the logfile. The logfile contains the output of the run.
parfile: Name of parameter file
qpAdm takes a parameter file in exactly the same format as qpWave.
The first population of (popleft) is the target population and the
main purpose of the program is to provide admixture weights, writing the
target as a linear combination of the other populations of popleft.
details: YES/NO
is a new option (not available in qpWave)
.../examples/qpAdm.log has annotations ### manually added to explain output
CAVEATS
1) It is important to realize that the answers are invalid if there has
been post admixture gene-flow between left and right populations.
2) Weights may be negative. If significantly so, the admixture model is
probably wrong.
3) We recommend keeping popright small. If large the covariance matrix
of f4-statististics is likely to be poorly estimated. I then don't trust
the computed p-values although the admixture weights seem to usually be
reasonable.
4) *** NEW ***
The new version is more robust in boundary cases where
the mixing coefficients are not well determined. In
well-conditioned cases the difference from the old code should
be very small. There's a magic constant (diagplus) that gets added
along the diagonal of various matrices. To get the old answers with the new
program code:
diagplus: 0
*** new feature (for both qpAdm and qpWave)***
Input can now be an fstats file -- see README.qpfstats
For qpAdm new parameter numboot: see README.qpfstats
*** NEW ***
If you run qpWave or qpAdm with allsnps: YES but no fstats filer, then the programs
call qpfstats with a system call. If this call fails it can be tricky to debug.
Options to make the process more transparent:
fstatslog: <your log file>
keeptmp: YES
## (Various temporary files are created in $STMP (see below) or /tmp. By default these are deleted.
fstastoutname: <myfstats.txt>
In some cases it may be useful to keep this file for future runs.
It is not always practical, but for some projects involbing less than (say) 40 populations it will be
best to run qpfstats (with allsnps: YES) and then run many qpWave/qpAdm runs off the fstats file
qpfstats may be slow but the follow on runs will be very fast.
By default /tmp is used for temporary files. To get more control set an environment variable STMP.
For instance in bash code
export STMP=~/mytrashdir
The directory should exist when qpWave or qpAdm are run.
Nick Patterson
<nickp@broadinstitute.org>
------------------------------------------------------------------------------