This repository contains code for the paper:
Li R., Benz L., Duan R., Denny J., Hakonarson H., Mosley J., Smoller J., Wei WQ., Ritchie M., Lumley, T., Moore J., and Chen Y. A One-Shot Lossless Algorithm for Cross-Cohort Learning in Mixed-Outcomes Analysis. (2024) (Pre-Print)
An R package for the mixWAS algorithm is available for download as follows
# install.packages('devtools')
devtools::install_github('lbenz730/mixWAS', build_vignettes = TRUE)A detailed tutorial for using mixWAS can be explored via
vignette('mixwas_tutorial') mixWAS exports the following 4 functions, with the following arguments
-
mixWAS: mixWAS algorithm for all sites (if all data can be supplied at once). Returns either p-value for SNP or score/variance components.-
snps: list of snps (one for each site), each a vector of SNPs$\in {0,1,2}$ -
phenotypes: list of phenotypes, each a matrix of phenotypes (one per column), with names of phenotypes specified as column names -
covariates: list of covariates, each a matrix or data frame of covariates -
covariate_map: Default =NULL. IfNULL, function assumes by all covariates are to be used for each phenotype. If this is not desired behavior, user can supply a data frame with two columns one calledvariableand a second calledphenotype. In the variable column is the name of covariates, with phenotypes being specified as 'all' (to use the variable for all phenotypes) or the name of a phenotype. Variables can be entered multiple times if they go to multiple phenotypes (but not all). A phenotype specific data set of covariates with use 'all' covariate + phenotype specific covariates. -
phenotype_index: list of vectors giving the index (numeric) of which phenotypes are in each site's matrix. IfNULL(default), will be inferred from matrix colnames. -
types: optional vector specifying data types e.g. ('continuous', 'binary', 'count'). Default =NULL(phenotype data types will be inferred), Note that 'count' will never be inferred, only 'binary' or 'continuous'. -
parallel_sites: logical, if score/variance component computations should be parallelized over sites. Default =FALSE. -
return_p: logical, ifTRUEreturn P-values, else return components like score/variance. Default =TRUE.
-
-
mixWAS_single_site: Compute score vector and covariance matrix for a single site via mixWAS algorithm.-
snps: matrix of phenotypes (one per column), with names of phenotypes specified as column names. -
phenotypes: matrix or data frame of covariates. -
covariates: list of covariates, each a matrix or data frame of covariates. -
covariate_map: Default =NULL. IfNULL, function assumes by all covariates are to be used for each phenotype. If this is not desired behavior, user can supply a data frame with two columns one calledvariableand a second calledphenotype. In the variable column is the name of covariates, with phenotypes being specified as 'all' (to use the variable for all phenotypes) or the name of a phenotype. Variables can be entered multiple times if they go to multiple phenotypes (but not all). A phenotype specific data set of covariates with use 'all' covariate + phenotype specific covariates. -
types: optional vector specifying data types ('continuous', 'binary', 'count'). Default = NULL (phenotype data types will be inferred). Note that 'count' will never be inferred, only 'binary' or 'continuous'.
-
-
run_hypothesis_test: run hypothesis tests from intermediate mixWAS components-
score: score vector from mixWAS intermediate output -
V_inv: Inverse Varariance Matrix from mixWAS intermediate output -
z: Standardized Z-scores from mixWAS intermediate output -
q: # of phenotypes}
-
-
combine_site_results: Combine results from running mixWAS on each site individually-
mixWAS_components: list ofmixWAS_single_siteoutput of length = # of sites -
q: # of phenotypes -
phenotypes: Optional list of phenotypes, each a matrix of phenotypes (one per column), with names of phenotypes specified as column names. Ifphenotype_indexisNULL, will be used to infer phenotypes. One ofphenotypesandphenotype_indexmust be specified -
phenotype_index: list of vectors giving the index (numeric) of which phenotypes are in each site's matrix. IfNULL(default), will be inferred fromphenotypes. -
return_p: logical, if TRUE return P-values, else return components like score/variance. Default =TRUE.
-
Simulations from the paper are available in the simulations folder.