-
Notifications
You must be signed in to change notification settings - Fork 2
2. Installation
The scripts written in Perl, Python, R, and BASH, will work within the context of a certain Linux environment (in this case a CentOS7 system on a SUN Grid Engine background). In addition to testing MetaGWASToolKit on CentOS7, we have tested it on OS X Sierra (version 10.11.[x]) too.
You can use the scripts locally to run analyses on a Unix-based system, like Mac OS X (Sierra+). We need to make an appropriate directory to download 'gits' to, and install this 'git'.
mkdir -p ~/git/ && cd ~/git
if [ -d ~/git/MetaGWASToolKit/.git ]; then \
cd ~/git/MetaGWASToolKit && git pull; \
else \
cd ~/git/ && git clone https://github.com/swvanderlaan/MetaGWASToolKit.git; \
fi
MetaGWASToolKit requires a couple of Python, Perl, and R specific packages and libraries installed. Most of the time these are readily available on your Mac or Linux-environment. But if not, here is a how-to to get these.
You will need to have Python 2.7.[x] installed with some obligatory packages for statistical analyses among others, these include YAML, Getopt::Long, and Statistics::Distributions. Installation can be achieved like this:
sudo cpan YAML Getopt::Long Statistics::Distributions
You will also need some Perl libraries installed for data munging and statistical analyses among others, these include numpy, scipy, scikit-learn, pandas, and argparse. Installation can be achieved like this:
pip2 install argparse numpy scipy scikit-learn pandas
You will need R version 3.4.[x]; a standard installation should suffice. At a minimum you will need optparse, tools, dplyr, tidyr, and data.table. You can install starting R and execute the following code which will also install any dependencies.
install.packages.auto <- function(x) {
x <- as.character(substitute(x))
if(isTRUE(x %in% .packages(all.available = TRUE))) {
eval(parse(text = sprintf("require(\"%s\")", x)))
} else {
# Update installed packages - this may mean a full upgrade of R, which in turn
# may not be warrented.
#update.packages(ask = FALSE)
eval(parse(text = sprintf("install.packages(\"%s\", dependencies = TRUE, repos = \"http://cran-mirror.cs.uu.nl/\")", x)))
}
if(isTRUE(x %in% .packages(all.available = TRUE))) {
eval(parse(text = sprintf("require(\"%s\")", x)))
} else {
source("http://bioconductor.org/biocLite.R")
# Update installed packages - this may mean a full upgrade of R, which in turn
# may not be warrented.
#biocLite(character(), ask = FALSE)
eval(parse(text = sprintf("biocLite(\"%s\")", x)))
eval(parse(text = sprintf("require(\"%s\")", x)))
}
}
cat("\n* Checking availability of required packages and installing if needed...\n\n")
### INSTALL PACKAGES WE NEED
install.packages.auto("optparse")
install.packages.auto("tools")
install.packages.auto("dplyr")
install.packages.auto("tidyr")
install.packages.auto("data.table")
MetaGWASToolKit requires you to install several software packages.
- PLINK2 for LD-calculations; reference: https://www.cog-genomics.org/plink2.
- LocusZoom v1.3 for automatic regional association plotting; reference: http://genome.sph.umich.edu/wiki/LocusZoom_Standalone.
- VEGAS2; for gene-based association analysis; reference: https://vegas2.qimrberghofer.edu.au.
- MAGMA for gene-based association analysis, and gene-set enrichment analyses; reference: https://ctg.cncr.nl/software/magma.
You will have to download and create some data needed for MetaGWASToolKit to function. The resource.creator.sh script will automagically create the necessary files. For some of these files, it is necessary to supply the proper reference data in VCF-format (version 4.1+). The files created by resource.creator.sh include:
- DBSNPFILE -- a dbSNP file containing information per variant based on dbSNP b150 (hg19, b37).
- REFFREQFILE -- a file containing reference frequencies per variant for the chosen reference and population.
- VINFOFILE -- a file needed to harmonize all the cohorts in terms of variant ID, contains various variantID versions (rs[XXXX], chr[X]:bp[XXX]:A1_A2, etc.). The resulting file is used by
gwas2ref.harmonizer.pylater on during harmonization. - GENESFILE -- a file containing chromosomal basepair positions per gene, default is
GENCODE. - REFERENCEVCF -- needed for downstream analyses, such as clumping of genome-wide significant hits, etc.
To download and install please run the following code, this should submit various jobs to create the necessary databases.
cd ~/git/MetaGWASToolKit && bash resource.creator.sh
There are a couple of reference available per standard, these are:
-
HapMap 2 [
HM2], version 2, release 22, b36. -- HM2 contains about 2.54 million variants, but does not include variants on the X-chromosome. Obviously few, if any, meta-analyses of GWAS will be based on that reference, but it's good to keep. View it as a 'legacy' feature. [NOT AVAILABLE YET] 🔷 -
1000G phase 1, version 3 [
1Gp1], b37. -- 1Gp1 contains about 38 million variants, including INDELs, and variation on the X, XY, and Y-chromosomes. -
1000G phase 3, version 5 [
1Gp3], b37. -- 1Gp3 contains about 88 million variants, including INDELs, and variation on the X, XY, and Y-chromosomes. [NOT AVAILABLE YET] 🔶 -
Genome of the Netherlands, version 4 [
GoNL4], b37. -- GoNL4 contains about xx million variants, including INDELs, and variation on the X, XY, and Y-chromosomes; some of which are unique for the Netherlands or are not present in dbSNP (yet). [NOT AVAILABLE YET] 🔷 -
Genome of the Netherlands, version 5 [
GoNL5], b37. -- GoNL4 contains about xx million variants, including INDELs, and variation on the X, XY, and Y-chromosomes; some of which are unique for the Netherlands or are not present in dbSNP (yet). [NOT AVAILABLE YET] 🔷 -
Combination of 1Gp3 and GoNL5 [
1Gp3GONL5], b37. -- This contains about 100 million variants, including INDELs, and variation on the X, XY, and Y-chromosomes; some of which are unique for the Netherlands or are not present in dbSNP (yet). [NOT AVAILABLE YET] 🔶
Copyright (c) 2015-2022 Sander W. van der Laan | s.w.vanderlaan [at] gmail [dot] com | swvanderlaan.github.io.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Reference: http://opensource.org.