-
Notifications
You must be signed in to change notification settings - Fork 31
/
MAGMA.Celltyping.Rmd
202 lines (142 loc) · 6.15 KB
/
MAGMA.Celltyping.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
---
title: "Getting started"
author: "<h4>Authors: <i>Brian M. Schilder, Alan Murphy, Julien Bryois & Nathan Skene</i></h4>"
date: "<h4>Updated: <i>`r format( Sys.Date(), '%b-%d-%Y')`</i></h4>"
output:
BiocStyle::html_document:
vignette: >
%\VignetteIndexEntry{MAGMA_Celltyping}
%\usepackage[utf8]{inputenc}
%\VignetteEngine{knitr::rmarkdown}
---
```{r setup, include=TRUE, message=FALSE}
library(MAGMA.Celltyping)
library(dplyr)
```
# Intro
`MAGMA.Celltyping` is a software package that facilitates conducting cell-type-specific enrichment tests on GWAS summary statistics.
# Setup
Specify where you want the large files to be downloaded to.
*NOTE*: Make sure you change `storage_dir` to somewhere other than `tempdir()`
if you want to make sure the results aren't deleted after this R session closes!
```{r, eval=FALSE}
storage_dir <- tempdir()
```
# Prepare data
## GWAS
- We need to have a summary statistics file to analyse as input.
- As an example, you can download UK Biobank summary statistics
for 'fluid_intelligence' using `get_example_gwas()`.
Here we provide a
pre-munged version of the above file.
### Munging
Our lab have created [`MungeSumstats`](https://github.com/neurogenomics/MungeSumstats), a robust Bioconductor package for formatting multiple types
of summary statistics files. We highly recommend processing your GWAS summary statistics with
`MungeSumstats` before continuing.
See the *full_workflow* vignette for more details.
The minimum info needed *after* munging is:
- "SNP", "CHR", and "BP" as first three columns.
- It has at least one of these columns: "Z","OR","BETA","LOG_ODDS","SIGNED_SUMSTAT"
```{r, eval=FALSE}
path_formatted <- MAGMA.Celltyping::get_example_gwas(
trait = "prospective_memory")
```
### Map SNPs to Genes
Note you can input the genome build of your summary statistics for this step or
it can be inferred if left `NULL`:
```{r, eval=FALSE}
genesOutPath <- MAGMA.Celltyping::map_snps_to_genes(
path_formatted = path_formatted,
genome_build = "GRCh37")
```
### MAGMA_Files_Public
Rather than preprocessing the GWAS yourself, you can instead use the [`MAGMA_Files_Public`](https://github.com/neurogenomics/MAGMA_Files_Public) database we have created. It contains pre-computed MAGMA SNP-to-genes mapping files for hundreds of GWAS.
You can browse which GWAS traits are available by looking at the provided [*metadata.csv*](https://github.com/neurogenomics/MAGMA_Files_Public/blob/master/metadata.csv) file.
```{r, eval = TRUE}
magma_dirs <- MAGMA.Celltyping::import_magma_files(ids = "ieu-a-298")
```
## CellTypeDataset
`ewceData` provides a number of CellTypeDatasets (CTD) to be used a cell-type
transcriptomic signature reference files.
If you want to create your own single-cell transcriptomic reference, you'll need
to first convert it to CTD using the instructions found in the `EWCE` package
[documentation here](https://github.com/NathanSkene/EWCE/).
```{r}
ctd <- ewceData::ctd()
```
Note that the cell type dataset loaded in the code above is the Karolinksa cortex/hippocampus data only. For the full Karolinska dataset with hypothalamus and midbrain instead use the following:
```
ctd <- MAGMA.Celltyping::get_ctd("ctd_allKI")
```
Or for the DRONC seq or AIBS datasets use:
```
ctd <- get_ctd("ctd_Tasic")
ctd <- get_ctd("ctd_DivSeq")
ctd <- get_ctd("ctd_AIBS")
ctd <- get_ctd("ctd_DRONC_human")
ctd <- get_ctd("ctd_DRONC_mouse")
ctd <- get_ctd("ctd_BlueLake2018_FrontalCortexOnly")
ctd <- get_ctd("ctd_BlueLake2018_VisualCortexOnly")
ctd <- get_ctd("ctd_Saunders")
```
# Run cell-type enrichment analyses
`MAGMA.Celltyping` offers a suite of functions for conducting various types of cell-type-specific enrichment tests on GWAS summary statistics.
The `celltype_associations_pipeline` wraps several functions that in previous versions of `MAGMA.Celltyping` had to be set up and run separately. These include:
- **Linear enrichment**: `calculate_celltype_associations(EnrichmentMode = "linear")` internally. Activated when `run_linear=TRUE`.
- **Top 10% enrichment**: Uses `calculate_celltype_associations(EnrichmentMode = "Top 10%")` internally. Activated when `run_top10=TRUE`.
- **Conditional enrichment**: Uses `calculate_conditional_celltype_associations` internally. Activated when `run_conditional=TRUE`.
Thus, `celltype_associations_pipeline` is designed to make these analyses easier to run.
```{r, eval = FALSE}
MAGMA_results <- MAGMA.Celltyping::celltype_associations_pipeline(
magma_dirs = magma_dirs,
ctd = ctd,
ctd_species = "mouse",
ctd_name = "Zeisel2015",
run_linear = TRUE,
run_top10 = TRUE)
```
We've also saved a pre-computed version of these results as a dataset:
```{r}
MAGMA_results <- MAGMA.Celltyping::enrichment_results
```
# Plot results
## Merge results
`merge_results` imports each of the MAGMA enrichment results files and merges them into one so that they can easily be plotted and further analysed.
```{r}
merged_results <- MAGMA.Celltyping::merge_results(
MAGMA_results = MAGMA_results)
knitr::kable(merged_results)
```
## Heatmap
Now we'll construct a heatmap visualizing the enrichment results, such that each GWAS is shown on the y-axis and each cell-type is shown on the x-axis. Results can be further facetted by what kind of test was run (linear, top10%, and/or conditional).
```{r Heatmap, error=TRUE}
heat <- MAGMA.Celltyping::results_heatmap(
merged_results = merged_results,
title = "Alzheimer's Disease (ieu-a-298) vs. nervous system cell-types (Zeisel2015)",
fdr_thresh = 1)
```
# Top results
## Top phenotypes
Get the phenotypes with the greatest number of significant cell-type enrichment results.
```{r}
top_phenos <- merged_results %>%
dplyr::group_by(EnrichmentMode, GWAS) %>%
dplyr::summarise(Celltype=dplyr::n_distinct(Celltype)) %>%
dplyr::arrange(dplyr::desc(Celltype))
knitr::kable(top_phenos)
```
## Top enrichments
Get the phenotypes-celltype enrichment results with the most significant p-values (per phenotype).
```{r}
top_enrich <- merged_results %>%
dplyr::group_by(EnrichmentMode, GWAS) %>%
dplyr::slice_min(FDR, n = 2)
knitr::kable(top_enrich)
```
# Session Info
<details>
```{r}
utils::sessionInfo()
```
</details>
<hr>