For a phenotype of interest, we have identified a GWAS locus based on rs10104559
, rs1365732
, rs12676370
.
The marginal z-score and LD matrix from here:
https://drive.google.com/drive/folders/1NsvBrWQaCcwLXaAcq2CRhcgGc9JxzhAO?usp=sharing
- Implement the efficient Bayes factor for each causal configurations
$$BF(\gamma:NULL) = \frac{\mathcal{N}(\hat{\mathbf{z}}c \vert \mathbf{0}, \mathbf{R}{CC} + \mathbf{R}{CC} \Sigma{CC} \mathbf{R}_{CC})}{\mathcal{N}(\hat{\mathbf{z}}c \vert \mathbf{0}, \mathbf{R}{CC})}$$
- Implement the prior calculation for each configurations
- Implement posterior inference over all possible configurations assuming at maximum 3 causal SNPs
- Implement posterior inclusion probabilities (PIP) to calculate SNP-level posterior probabilities
Visualize the configuration posteriors by ranking them in increasing order. As we can see, the vast majority of the configurations have very small posterior probabilities.
Visualize the normalized inferred PIP aligned with GWAS marginal -log10 p-values. It looks like we missed one of the 3 causal SNPs due to its nearly perfect LD with the other causal SNPs. But in general, we are able to pull down quite a few non-causal ones. That is, if we were going to experimentally validate the top SNPs, 2 out of 6 top SNPs based on PIP are true causal ones, whereas we would have got a lot more false positives if we were to follow the -log10 P-values instead.