Alan Murphy and Nathan Skene 2021-04-22
The MungeSumstats package is designed to facilitate the standardisation of GWAS summary statistics as utilised in our Nature Genetics paper.1
The package is designed to handle the lack of standardisation of output files by the GWAS community. There is a group who have now manually standardised many GWAS: R interface to the IEU GWAS database API • ieugwasr and gwasvcf but because a lot of GWAS remain closed access, these repositories are not all encompassing.
MungeSumstats provides a framework to standardise the format for any GWAS summary statistics, including those in VCF format, enabling downstream integration and analysis. The package works by addressing the most common discrepancies across summary statistic files. MungeSumstats also offers a range of adjustable, Quality Control (QC) steps.
The MungeSumstats is in the process of being added to Bioconductor but, in the meantime, is available from github. To be able to install the package one needs to install the devel version of R (version 4.1) which can be found at https://cran.r-project.org/ and then run the following lines of code:
if (!require("devtools")) {
install.packages("devtools")
}
devtools::install_github("neurogenomics/MungeSumstats")
To install MungeSumstats on Bioconductor run:
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install(version = "devel")
BiocManager::install("MungeSumstats")
You can then load the package and data package:
library(MungeSumstats)
See the vignette for use cases of MungeSumstats:
browseVignettes("MungeSumstats")
If you have any problems please do file an issue here on github.
If you use the MungeSumstats package as well then please cite
The MungeSumstats package aims to be able to handle the most common summary statistic file formats including VCF. If your file can not be formatted by MungeSumstats feel free to report the bug on github: https://github.com/neurogenomics/MungeSumstats along with your summary statistic file header.
We also encourage people to edit the code to resolve their particular
issues too and are happy to incorporate these through pull requests on
github. If your summary statistic file headers are not recognised by
MungeSumstats but correspond to one of SNP, BP, CHR, A1, A2, P, Z, OR,
BETA, LOG_ODDS, SIGNED_SUMSTAT, N, N_CAS, N_CON, NSTUDY, INFO or
FRQ, feel free to update the MungeSumstats::sumstatsColHeaders
following the approach in the data.R file and add your mapping. Then use
a pull request on github and we will incorporate this change into the
package.
1. Nathan G. Skene, T. E. B., Julien Bryois. Genetic identification of brain cell types underlying schizophrenia. Nature Genetics (2018). doi:10.1038/s41588-018-0129-5