Poster: Semiparametric variance moderation in high-dimensional biology

Materials for a poster to be given at the 2019 Superfund Research Program conference in Seattle, WA

Authors: Nima Hejazi, Mark van der Laan, Martyn Smith, Alan Hubbard

Summary

Exploratory analysis of high-dimensional biological data has received much attention since the explosion of high-throughput biotechnology enabled the simultaneous screening of thousands of molecular characteristics (genomics, metabolomics, proteomics, microbiomics, metallomics). Unfortunately, such analyses pose numerous challenges for both statisticians and scientists in (1) deriving estimation of independent associations (variable importance measures) in the context of many competing causes in flexible and honest statistical models, and (2) the use of robust empirical Bayes variance estimators (e.g., LIMMA) to enable stable small-sample inference when modern machine learning is leveraged in such settings. We present an approach that constructs locally efficient estimators of nonparametric variable importance measures based on causal effect parameters. The resultant estimates are endowed with scientifically convenient interpretations, under the standard assumptions of causal inference, and are robust to model misspecification by incorporating ensemble machine learning in the estimation of relevant factors of the data- generating distribution. The estimators we present have closed-form representations, allowing for variance moderation to be applied in deriving robust hypothesis tests and confidence intervals. We illustrate the methodology by applying these approaches to high-dimensional data sets of relatively modest sample size from microarray studies of exposure to environmental contaminants, combining existing targeted maximum likelihood learning methodology with a simple generalization of empirical Bayes approaches that improve the stability of estimators in small samples. The result is a machine learning-based approach that can estimate independent associations of biomarkers within high-dimensional data, teasing apart the effects of potential confounds and protecting against the unreliability introduced by small-sample inference. We also discuss a recently developed software library (the biotmle R package: https://bioconductor.org/packages/biotmle) as well as methods to circumvent the statistical pitfalls of multiple comparisons.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
figs		figs
.gitignore		.gitignore
.here		.here
2019_srp.bib		2019_srp.bib
2019_srp.pdf		2019_srp.pdf
2019_srp.tex		2019_srp.tex
IEEEtranS.bst		IEEEtranS.bst
Makefile		Makefile
README.md		README.md
baposter.cls		baposter.cls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Poster: Semiparametric variance moderation in high-dimensional biology

Summary

License

About

Releases

Packages

Languages

nhejazi/conf_srp2019_biotmle

Folders and files

Latest commit

History

Repository files navigation

Poster: Semiparametric variance moderation in high-dimensional biology

Summary

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages