-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathGene_Centric_Coding_Results_Summary_incl_ptv.Rd
163 lines (116 loc) · 11.6 KB
/
Gene_Centric_Coding_Results_Summary_incl_ptv.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Gene_Centric_Coding_Results_Summary_incl_ptv.R
\name{Gene_Centric_Coding_Results_Summary_incl_ptv}
\alias{Gene_Centric_Coding_Results_Summary_incl_ptv}
\title{Summarize gene-centric coding analysis results generated by \code{STAARpipeline} package and
perform conditional analysis for (unconditionally) significant coding masks (including masks ptv and ptv_ds) by adjusting for a given list of known variants}
\usage{
Gene_Centric_Coding_Results_Summary_incl_ptv(
agds_dir,
gene_centric_coding_jobs_num,
input_path,
output_path,
gene_centric_results_name,
obj_nullmodel,
known_loci = NULL,
cMAC_cutoff = 0,
method_cond = c("optimal", "naive"),
rare_maf_cutoff = 0.01,
QC_label = "annotation/filter",
variant_type = c("SNV", "Indel", "variant"),
geno_missing_imputation = c("mean", "minor"),
Annotation_dir = "annotation/info/FunctionalAnnotation",
Annotation_name_catalog,
Use_annotation_weights = FALSE,
Annotation_name = NULL,
alpha = 2.5e-06,
manhattan_plot = FALSE,
QQ_plot = FALSE,
cond_null_model_name = NULL,
cond_null_model_dir = NULL,
SPA_p_filter = FALSE,
p_filter_cutoff = 0.05
)
}
\arguments{
\item{agds_dir}{file directory of annotated GDS (aGDS) files for all chromosomes (1-22)}
\item{gene_centric_coding_jobs_num}{the number of gene-centric coding analysis results generated by \code{STAARpipeline} package.}
\item{input_path}{the directory of gene-centric coding analysis results that generated by \code{STAARpipeline} package.}
\item{output_path}{the directory for the output files.}
\item{gene_centric_results_name}{file name of gene-centric coding analysis results generated by \code{STAARpipeline} package.}
\item{obj_nullmodel}{an object from fitting the null model, which is either the output from \code{fit_nullmodel} function in the \code{STAARpipeline} package,
or the output from \code{fitNullModel} function in the \code{GENESIS} package and transformed using the \code{genesis2staar_nullmodel} function in the \code{STAARpipeline} package.}
\item{known_loci}{the data frame of variants to be adjusted for in conditional analysis and should
contain 4 columns in the following order: chromosome (CHR), position (POS), reference allele (REF),
and alternative allele (ALT) (default = NULL).}
\item{cMAC_cutoff}{the cutoff of the minimum number of the cumulative minor allele of variants in the masks
when summarizing the results (default = 0).}
\item{method_cond}{a character value indicating the method for conditional analysis.
\code{optimal} refers to regressing residuals from the null model on \code{known_loci}
as well as all covariates used in fitting the null model (fully adjusted) and taking the residuals;
\code{naive} refers to regressing residuals from the null model on \code{known_loci}
and taking the residuals (default = \code{optimal}).}
\item{rare_maf_cutoff}{the cutoff of maximum minor allele frequency in
defining rare variants (default = 0.01).}
\item{QC_label}{channel name of the QC label in the GDS/aGDS file (default = "annotation/filter").}
\item{variant_type}{type of variant included in the analysis. Choices include "SNV", "Indel", or "variant" (default = "SNV").}
\item{geno_missing_imputation}{method of handling missing genotypes. Either "mean" or "minor" (default = "mean").}
\item{Annotation_dir}{channel name of the annotations in the aGDS file \cr (default = "annotation/info/FunctionalAnnotation").}
\item{Annotation_name_catalog}{a data frame containing the name and the corresponding channel name in the aGDS file.}
\item{Use_annotation_weights}{use annotations as weights or not (default = FALSE).}
\item{Annotation_name}{a vector of annotation names used in STAAR (default = NULL).}
\item{alpha}{p-value threshold of significant results (default = 2.5E-06).}
\item{manhattan_plot}{output manhattan plot or not (default = FALSE).}
\item{QQ_plot}{output Q-Q plot or not (default = FALSE).}
\item{cond_null_model_name}{the null model name for conditional analysis in the SPA setting, only used for imbalanced case-control setting (default = NULL).}
\item{cond_null_model_dir}{the directory of storing the null model for conditional analysis in the SPA setting, only used for imbalanced case-control setting (default = NULL).}
\item{SPA_p_filter}{logical: are only the variants with a normal approximation based p-value smaller than a pre-specified threshold use the SPA method to recalculate the p-value, only used for imbalanced case-control setting (default = FALSE).}
\item{p_filter_cutoff}{threshold for the p-value recalculation using the SPA method, only used for imbalanced case-control setting (default = 0.05).}
}
\value{
The function returns the following analysis results:
\code{coding_sig.csv}: a matrix that summarizes the unconditional significant coding masks detected by STAAR-O or STAAR-B in imbalanced case-control setting (STAAR-O/-B pvalue smaller than the threshold alpha),
including gene name ("Gene name"), chromosome ("chr"), coding functional category ("Category"), number of variants ("#SNV"),
and unconditional p-values of set-based tests SKAT ("SKAT(1,25)"), Burden ("Burden(1,1)"), ACAT-V ("ACAT-V(1,25)") and STAAR-O ("STAAR-O")
or unconditional p-values of set-based tests Burden ("Burden(1,1)") and STAAR-B ("STAAR-B") for imbalanced case-control setting.
\code{coding_sig_cond.csv}: a matrix that summarized the conditional analysis results of unconditional significant coding masks detected by STAAR-O or STAAR-B in imbalanced case-control setting (available if known_loci is not a NULL),
including gene name ("Gene name"), chromosome ("chr"), coding functional category ("Category"), number of variants ("#SNV"),
and conditional p-values of set-based tests SKAT ("SKAT(1,25)"), Burden ("Burden(1,1)"), ACAT-V ("ACAT-V(1,25)") and STAAR-O ("STAAR-O")
or conditional p-values of set-based tests Burden ("Burden(1,1)") and STAAR-B ("STAAR-B") for imbalanced case-control setting.
\code{results_plof_genome.Rdata}: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the putative loss of function variants (plof) for all protein-coding genes across the genome.
\code{plof_sig.csv}: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant plof masks.
\code{plof_sig_cond.csv}: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant plof masks (available if known_loci is not a NULL).
\code{results_plof_ds_genome.Rdata}: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the putative loss of function variants and disruptive missense variants (plof_ds) for all protein-coding genes across the genome.
\code{plof_ds_sig.csv}: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant plof_ds masks.
\code{plof_ds_sig_cond.csv}: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant plof_ds masks (available if known_loci is not a NULL).
\code{results_ptv_genome.Rdata}: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the protein-truncating variants (ptv) for all protein-coding genes across the genome.
\code{ptv_sig.csv}: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant ptv masks.
\code{ptv_sig_cond.csv}: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant ptv masks (available if known_loci is not a NULL).
\code{results_ptv_ds_genome.Rdata}: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the protein-truncating variants and disruptive missense variants (ptv_ds) for all protein-coding genes across the genome.
\code{ptv_ds_sig.csv}: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant ptv_ds masks.
\code{ptv_ds_sig_cond.csv}: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant ptv_ds masks (available if known_loci is not a NULL).
\code{results_disruptive_missense_genome.Rdata}: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the disruptive missense variants (disruptive_missense) for all protein-coding genes across the genome.
\code{disruptive_missense_sig.csv}: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant disruptive_missense masks.
\code{disruptive_missense_sig_cond.csv}: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant disruptive_missense masks (available if known_loci is not a NULL).
\code{results_missense_genome.Rdata}: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the missense variants (missense) for all protein-coding genes across the genome.
\code{missense_sig.csv}: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant missense masks.
\code{missense_sig_cond.csv}: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant missense masks (available if known_loci is not a NULL).
\code{results_synonymous_genome.Rdata}: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the synonymous variants (synonymous) for all protein-coding genes across the genome.
\code{synonymous_sig.csv}: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant synonymous masks.
\code{synonymous_sig_cond.csv}: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant synonymous masks (available if known_loci is not a NULL).
manhattan plot (optional) and Q-Q plot (optional) of the gene-centric coding analysis results.
}
\description{
The \code{Gene_Centric_Coding_Results_Summary_incl_ptv} function takes in the objects of gene-centric coding analysis results
generated by \code{STAARpipeline} package,
the object from fitting the null model, and the set of known variants to be adjusted for in conditional analysis
to summarize the gene-centric coding analysis results and analyze the conditional association between a quantitative/dichotomous phenotype
(including imbalanced case-control design) and
the rare variants in the unconditional significant coding masks.
}
\references{
Li, Z., Li, X., et al. (2022). A framework for detecting
noncoding rare-variant associations of large-scale whole-genome sequencing
studies. \emph{Nature Methods}, \emph{19}(12), 1599-1611.
(\href{https://doi.org/10.1038/s41592-022-01640-x}{pub})
}