2. Installation

Introduction

The scripts written in Perl, Python, R, and BASH, will work within the context of a certain Linux environment (in this case a CentOS7 system on a SUN Grid Engine background). In addition to testing MetaGWASToolKit on CentOS7, we have tested it on OS X Sierra (version 10.11.[x]) too.

Installing the scripts locally.

You can use the scripts locally to run analyses on a Unix-based system, like Mac OS X (Sierra+). We need to make an appropriate directory to download 'gits' to, and install this 'git'.

Step 1: Make a directory, and go there.

mkdir -p ~/git/ && cd ~/git

Step 2: Clone this git, unless it already exists.

if [ -d ~/git/MetaGWASToolKit/.git ]; then \
		cd ~/git/MetaGWASToolKit && git pull; \
	else \
		cd ~/git/ && git clone https://github.com/swvanderlaan/MetaGWASToolKit.git; \
	fi

Step 3: Check for dependencies of Python, Perl and R, and install them if necessary.

MetaGWASToolKit requires a couple of Python, Perl, and R specific packages and libraries installed. Most of the time these are readily available on your Mac or Linux-environment. But if not, here is a how-to to get these.

You will need to have Python 2.7.[x] installed with some obligatory packages for statistical analyses among others, these include YAML, Getopt::Long, and Statistics::Distributions. Installation can be achieved like this:

sudo cpan YAML Getopt::Long Statistics::Distributions

You will also need some Perl libraries installed for data munging and statistical analyses among others, these include numpy, scipy, scikit-learn, pandas, and argparse. Installation can be achieved like this:

pip2 install argparse numpy scipy scikit-learn pandas

You will need R version 3.4.[x]; a standard installation should suffice. At a minimum you will need optparse, tools, dplyr, tidyr, and data.table. You can install starting R and execute the following code which will also install any dependencies.

install.packages.auto <- function(x) { 
  x <- as.character(substitute(x)) 
  if(isTRUE(x %in% .packages(all.available = TRUE))) { 
    eval(parse(text = sprintf("require(\"%s\")", x)))
  } else { 
    # Update installed packages - this may mean a full upgrade of R, which in turn
    # may not be warrented. 
    #update.packages(ask = FALSE) 
    eval(parse(text = sprintf("install.packages(\"%s\", dependencies = TRUE, repos = \"http://cran-mirror.cs.uu.nl/\")", x)))
  }
  if(isTRUE(x %in% .packages(all.available = TRUE))) { 
    eval(parse(text = sprintf("require(\"%s\")", x)))
  } else {
    source("http://bioconductor.org/biocLite.R")
    # Update installed packages - this may mean a full upgrade of R, which in turn
    # may not be warrented.
    #biocLite(character(), ask = FALSE) 
    eval(parse(text = sprintf("biocLite(\"%s\")", x)))
    eval(parse(text = sprintf("require(\"%s\")", x)))
  }
}

cat("\n* Checking availability of required packages and installing if needed...\n\n")
### INSTALL PACKAGES WE NEED
install.packages.auto("optparse")
install.packages.auto("tools")
install.packages.auto("dplyr")
install.packages.auto("tidyr")
install.packages.auto("data.table")

Step 4: Installation of necessary software.

MetaGWASToolKit requires you to install several software packages.

PLINK2 for LD-calculations; reference: https://www.cog-genomics.org/plink2.
LocusZoom v1.3 for automatic regional association plotting; reference: http://genome.sph.umich.edu/wiki/LocusZoom_Standalone.
VEGAS2; for gene-based association analysis; reference: https://vegas2.qimrberghofer.edu.au.
MAGMA for gene-based association analysis, and gene-set enrichment analyses; reference: https://ctg.cncr.nl/software/magma.

Step 5: Create necessary databases.

You will have to download and create some data needed for MetaGWASToolKit to function. The resource.creator.sh script will automagically create the necessary files. For some of these files, it is necessary to supply the proper reference data in VCF-format (version 4.1+). The files created by resource.creator.sh include:

DBSNPFILE -- a dbSNP file containing information per variant based on dbSNP b150 (hg19, b37).
REFFREQFILE -- a file containing reference frequencies per variant for the chosen reference and population.
VINFOFILE -- a file needed to harmonize all the cohorts in terms of variant ID, contains various variantID versions (rs[XXXX], chr[X]:bp[XXX]:A1_A2, etc.). The resulting file is used by gwas2ref.harmonizer.py later on during harmonization.
GENESFILE -- a file containing chromosomal basepair positions per gene, default is GENCODE.
REFERENCEVCF -- needed for downstream analyses, such as clumping of genome-wide significant hits, etc.

To download and install please run the following code, this should submit various jobs to create the necessary databases.

cd ~/git/MetaGWASToolKit && bash resource.creator.sh

Available references

There are a couple of reference available per standard, these are:

HapMap 2 [HM2], version 2, release 22, b36. -- HM2 contains about 2.54 million variants, but does not include variants on the X-chromosome. Obviously few, if any, meta-analyses of GWAS will be based on that reference, but it's good to keep. View it as a 'legacy' feature. [NOT AVAILABLE YET] 🔷
1000G phase 1, version 3 [1Gp1], b37. -- 1Gp1 contains about 38 million variants, including INDELs, and variation on the X, XY, and Y-chromosomes.
1000G phase 3, version 5 [1Gp3], b37. -- 1Gp3 contains about 88 million variants, including INDELs, and variation on the X, XY, and Y-chromosomes. [NOT AVAILABLE YET] 🔶
Genome of the Netherlands, version 4 [GoNL4], b37. -- GoNL4 contains about xx million variants, including INDELs, and variation on the X, XY, and Y-chromosomes; some of which are unique for the Netherlands or are not present in dbSNP (yet). [NOT AVAILABLE YET] 🔷
Genome of the Netherlands, version 5 [GoNL5], b37. -- GoNL4 contains about xx million variants, including INDELs, and variation on the X, XY, and Y-chromosomes; some of which are unique for the Netherlands or are not present in dbSNP (yet). [NOT AVAILABLE YET] 🔷
Combination of 1Gp3 and GoNL5 [1Gp3GONL5], b37. -- This contains about 100 million variants, including INDELs, and variation on the X, XY, and Y-chromosomes; some of which are unique for the Netherlands or are not present in dbSNP (yet). [NOT AVAILABLE YET] 🔶

The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Reference: http://opensource.org.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2. Installation

Introduction

Installing the scripts locally.

Step 1: Make a directory, and go there.

Step 2: Clone this git, unless it already exists.

Step 3: Check for dependencies of Python, Perl and R, and install them if necessary.

Step 4: Installation of necessary software.

Step 5: Create necessary databases.

Available references

The MIT License (MIT)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally