-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path07-NCI.Rmd
699 lines (641 loc) · 32.4 KB
/
07-NCI.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
```{r setup_nci, include=FALSE}
rm(list = ls()) ; invisible(gc()) ; set.seed(42)
library(knitr)
library(kableExtra)
if(knitr:::is_html_output()) options(knitr.table.format = "html")
if(knitr:::is_latex_output()) options(knitr.table.format = "latex")
library(tidyverse)
library(raster)
library(bayesplot)
library(kinship2)
library(rstan)
theme_set(bayesplot::theme_default())
opts_chunk$set(
echo = F, message = F, warning = F, fig.height = 6, fig.width = 8,
cache = F, cache.lazy = F)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = T)
path <- "data/Symphonia_Paracou/Sequences/populationGenomics/"
```
# Neighbourhood crowding effect on neutral and adaptive genetic variation
We did environmental association analyses [@Rellstab2015] in each complex using general linear mixed models developed for genome wide association studies (GWAS).
We used the mean neighbourhood crowding index [$NCI$; @Uriarte2004] over the last 30 years,
an indirect measurement of access to light and forest gap dynamics,
as the response variable and genetic structure (gene pools representing species) and relatedness (kinship matrix) as explanatory variables, as it is common practice [@Rellstab2015].
This analysis assumed that the neighbourhood crowding conditions where individuals have grown above 10-cm DBH are strongly correlated to the individual heritable phenotypes [e.g. @Eckert2010].
The mean neighbourhood crowding index $NCI_i$ from tree individual $i$ was calculated as follows:
$$NCI_i=\overline{\sum_{j|\delta_{i,j}<20m}DBH^2_{j,t}.e^{-\frac14\delta_{i,j}}}$$
with $DBH_{j,t}$ the diameter of the neighbouring tree $j$ in year $t$ and $\delta_{i,j}$ its distance to the individual tree $i$.
$NCI_i$ is computed for all neighbours at a distance $\delta_{i,j}$ inferior to the maximum neighbouring distance of 20 meters.
The power of neighbours $DBH_{j,t}$ effect was set to 2 to represent a surface.
The decrease of neighbours' diameter effect with distance was set to -0.25 to represent trees at 20 meters of the focal trees having 1% of the effect of the same tree at 0 meters.
$NCI_i$ is computed as the mean of yearly $NCI_{i,t}$ over the last 30 years denoted by the overline.
We used genetic species and individual kinship in an animal model [@Wilson2010] to estimate genetic variance associated with neighbourhood crowding index.
We used a lognormal likelihood given that distributions of environmental variables were positive and skewed.
We inferred individual kinship using KING [@Manichaikul2010], as the method is robust to population structure.
We set negative kinship values to null as they were confounding with population structure, and we further ensured that the matrix was positive-definite using the `nearPD` function from the R package `Matrix`.
The environment $y_{s,i}$ where individual $i$ in species $s$ grows was inferred with a lognormal distribution with the following formula:
$$y_{s,i} \sim logN(log(\mu_s.a_{i}),\sigma^2_1)$$
$$a_{i} \sim MVlogN_N(log(1),\sigma^2_2.K)$$
where $\mu_s$ is the mean environment of species $s$, $a_i$ is the breeding value of the individual $i$ and $\sigma^2_1$ is the shape parameter of the lognormal.
Individual breeding values $a_i$ are defined following a multivariate lognormal law $\mathcal{MVlogN}$ of co-shape matrix defined as the product of the kinship matrix $K$ with estimated individual genotypic variation $\sigma^2_2$.
To estimate variances on a normal scale, we log-transformed species fixed effect, genetic additive values, and we calculated conditional and marginal $R^2$ [@Nakagawa2013].
A Bayesian method was used to infer parameters using `stan` language [@Carpenter2017] and the `rstan` package [@StanDevelopmentTeam2018] in the R environment [@RCoreTeam2020] using the No-U-Turn Sampler algorithm [NUTS, @Hoffman2014], which performs better for estimating genetic parameters and breeding values [@Nishio2019].
## Simulated Animal Model
The aim of this subchapter is to explore the animal model with generated data to validate their behaviour and use it on *Symphonia* real data.
Let's consider a set of $P=3$ populations including each $Fam=3$ families composed of $I = 14$ individuals with arbitrary relationships (it's only 126 individuals to do quick tests).
```{r simkinship, fig.cap="Kinship matrix"}
simK <- function(
P = 3,
Fam = 3
){
I <- 14 # Individuals
ped.df <- data.frame(pop = rep(1:P, each = Fam*I),
fam = rep(1:Fam, each = I),
ind = rep(1:I, Fam*P),
father = rep(c(NA, NA, 1, 1, 1, 1, NA,
6, 6, 6, NA, 11, 11, 11), Fam*P),
mother = rep(c(NA, NA, 2, 2, 2, 2, NA,
7, 7, 7, NA, 3, 3, 3), Fam*P),
sex = rep(c(1, 2, 2, 3, 3, 1, 2, 3, 3,
3, 1, 3, 3, 3), Fam*P)) %>%
mutate_at(c("father", "mother"), funs(ifelse(!is.na(.),
paste0(pop,fam,.), NA))) %>%
mutate(ind = paste0(pop,fam,ind))
ped.ped <- kinship2::pedigree(id = ped.df$ind, dadid = ped.df$father,
momid = ped.df$mother, sex = ped.df$sex,
famid = paste0(ped.df$pop, ped.df$fam))
K <- as.matrix(kinship2::kinship(ped.ped))
K <- 2*K
return(list(df = ped.df, K = K))
}
heatmap(simK()$K)
```
We used the following animal model with a lognormal distribution to estimate population and genotypic variance:
\begin{equation}
y_{p,i} \sim \mathcal{logN}(log(\mu_p.a_{i}),\sigma_1) \\
a_{p,i} \sim \mathcal{MVlogN_N}(log(1),\sigma_2.K)
(\#eq:animallogth)
\end{equation}
```{r simanimal}
simAnimal <- function(
P = 3,
Fam = 3,
sigmaP = 0.5,
sigmaG = 0.3,
sigmaR = 0.2
){
K <- simK(P, Fam)
N <- nrow(K$K)
epsilon <- rnorm(N)
mu <- rlnorm(P, sd = sigmaP)
y <- rlnorm(N,
log(mu[K$df$pop]) + sigmaG*as.vector(t(chol(K$K)) %*% epsilon),
sigmaR)
mdata <- list(N = N, P = P, K = K$K, population = K$df$pop, y = y)
return(list(mu = mu,
Vp = var(log(mu[K$df$pop])),
Vg = var(sigmaG*as.vector(t(chol(K$K)) %*% epsilon)),
Vr = sigmaR^2,
mdata = mdata))
}
```
We fitted the equivalent model with the following priors:
\begin{equation}
y_{p,i} \sim \mathcal{logN}(log(\mu_p.\hat{a_{i}}), \sigma_1) \\
\hat{a_{i}} = e^{\sqrt{V_G}.A.\epsilon_i} \\
\epsilon_i \sim \mathcal{N}(0,1) \\
~ \\
\mu_p \sim \mathcal{logN}(log(1),1) \\
\sigma_1 \sim \mathcal N_T(0,1) \\
~ \\
V_Y = Var(log(y)) \\
V_P = Var(log(\mu_p)) \\
V_R=\sigma_1^2 \\
V_G = V_Y - V_P - V_R \\
(\#eq:animallogthstan)
\end{equation}
```{r animalTable}
# mdata <- simAnimal(P = 3, Fam = 3)
# animal <- stan_model("symcapture_models/AnimalLog.stan")
# fitAnimal <- sampling(animal, chains = 2, save_warmup = F, data = mdata$mdata,
# init = "0",
# control = list(adapt_delta = 0.99, max_treedepth = 12))
# save(mdata, fitAnimal, file = file.path("symcapture_save", "animal.Rdata"))
load(file.path("save", "animal.Rdata"))
broom.mixed::tidyMCMC(fitAnimal, pars = c("mu", "Vp", "Vg", "Vr", "lp__"),
droppars = NULL, rhat = T) %>%
mutate(expected = with(mdata, c(mu, Vp, Vg, Vr))) %>%
dplyr::select(term, estimate, expected, std.error, rhat) %>%
kable(caption = "Animal model fitted versus expected values.",
col.names = c("Parameter", "Estimate", "Expected",
"Standard error", "$\\hat R$"))
```
```{r animalTrace, fig.cap="Parameters for the Animal model: trace plot and expected value in red."}
mcmc_trace(as.array(fitAnimal, pars = c("mu", "Vp", "Vg", "Vr", "lp__")),
np = nuts_params(fitAnimal)) +
geom_hline(aes(yintercept = expected), col = "red",
data = data.frame(parameter = c(paste0("mu[", 1:length(mdata$mu), "]"),
"Vp", "Vg", "Vr"),
expected = with(mdata, c(mu, Vp, Vg, Vr))))
```
## Neighbourhood crowding index
```{r NCI, eval=F, echo=T}
trees <- src_sqlite(file.path("data", "Paracou","trees", "Paracou.sqlite")) %>%
tbl("Paracou") %>%
filter(Genus == "Symphonia") %>%
filter(CensusYear == 2015) %>%
collect()
trees <- read_tsv(file.path(path, "..", "variantCalling", "paracou3pop",
"symcapture.all.biallelic.snp.filtered.nonmissing.paracou3pop.fam"),
col_names = c("FID", "IID", "FIID", "MIID", "sex", "phenotype")) %>%
mutate(Ind = gsub(".g.vcf", "", IID)) %>%
mutate(X = gsub("P", "", Ind)) %>%
separate(X, c("Plot", "SubPlot", "TreeFieldNum"), convert = T) %>%
left_join(trees) %>%
left_join(read_tsv(file.path(path, "bayescenv", "paracou3pop.popmap"),
col_names = c("IID", "pop")))
cl <- parallel::makeCluster(getOption("cl.cores", 4))
parallel::clusterExport(cl, list("trees"))
NC <- parallel::parLapply(cl, 1:nrow(trees), function(ind){
library(tidyverse)
src_sqlite(file.path("data", "Paracou", "trees", "Paracou.sqlite")) %>%
tbl("Paracou") %>%
filter(Plot == local(trees$Plot[ind])) %>%
filter(idTree != local(trees$idTree[ind])) %>%
mutate(dij = sqrt((local(trees$Xutm[ind]) - Xutm)^2+(local(trees$Yutm[ind]) - Yutm)^2)) %>%
filter(dij < 20) %>%
mutate(con = ifelse(Genus == local(trees$Genus[ind]) && Species == local(trees$Species[ind]), 1, 0)) %>%
mutate(DBH = CircCorr/pi) %>%
collect() %>%
group_by(CensusYear) %>%
summarise(NCI = sum(DBH*DBH*exp(-0.25*dij))) %>%
ungroup() %>%
summarise(idTree = local(trees$idTree[ind]),
NCI = mean(NCI))})
parallel::stopCluster(cl) ; rm(cl)
NC <- bind_rows(NC)
trees <- left_join(trees, NC) %>%
dplyr::select(IID, Ind, pop, NCI)
rm(NC)
write_tsv(trees, file = "save/NCI.tsv")
```
```{r Envmdata}
trees <- read_tsv(file = "save/NCI.tsv")
ids <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king.id"))
K <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king"),
col_names = ids$IID) %>%
as.data.frame()
row.names(K) <- ids$IID
K <- as.matrix(K)
trees <- trees %>%
filter(IID %in% colnames(K)) %>%
mutate(IndNum = 1:nrow(.), popNum = as.numeric(as.factor(pop)))
mdata <- lapply(c("NCI"), function(variable) {
K <- K[trees$IID, trees$IID]
K[K < 0] <- 0
K <- K*2
K <- as.matrix(Matrix::nearPD(K)$mat)
return(list(N = nrow(trees),
P = length(unique(trees$pop)),
y = as.vector(scale(trees[variable], center = F)),
population = trees$popNum,
K = K))})
names(mdata) <- c("NCI")
mdata$RE$y <- mdata$RE$y + 1 # for the lognormal
save(mdata, file = file.path("save", 'dataEnv.Rdata'))
```
## Genetic variance
We used between individual kinship and a lognormal Animal model [@Wilson2010] to estimate genetic variance associated to individuals' global phenotype living in a given environment [see environmental association analyses with genome wide association study analyses in @Rellstab2015].
The animal model is calculated for the environmental values $y$ of the $N$ individuals with the following formula:
\begin{equation}
y_{p,i} \sim \mathcal{logN}(log(a_{p,i}),\sigma_1) \\
a_{p,i} \sim \mathcal{MVlogN_N}(log(\mu_p),\sigma_2.K)
(\#eq:animalenv)
\end{equation}
where individual is defined as a normal law centered on the individual genetic additive effects $a$ and associated individual remaining variance $\sigma_R$. Additive genetic variance $a$ follows a multivariate lognormal law centered on the population mean $\mu_{Population}$ of covariance $\sigma_G K$.
We fitted the equivalent model with following priors:
\begin{equation}
y_{p,i} \sim \mathcal{logN}(log(\mu_p) + \hat{\sigma_2}.A.\epsilon_i, \sigma_1) \\
\epsilon_i \sim \mathcal{N}(0,1) \\
~ \\
\mu_p \sim \mathcal{logN}(log(1),1) \\
\sigma_1 \sim \mathcal N_T(0,1) \\
\hat{\sigma_2} = \sqrt(V_G)
~ \\
V_Y = Var(log(y)) \\
V_P = Var(log(\mu_p)) \\
V_G = V_Y - V_P - V_R \\
V_R=\sigma_1^2
(\#eq:animalenvstan)
\end{equation}
```{bash envGenoCluster, eval=F}
vars=(NCI)
for var in "${vars[@]}" ; do for chain in $(seq 8) ; do echo "module purge ; module load compiler/gcc-7.2.0 ; module load system/R-3.5.3 ; R_LIBS_USER=\" \" Rscript EnvGeno.R $chain $var" ; done ; done > EnvGeno.sh
sarray -J Env -o out/%j.Env.out -e out/%j.Env.err -t 48:00:00 --constraint=broadwell --cpus-per-task=1 --mail-type=BEGIN,END,FAIL EnvGeno.sh
```
```{r envGenoTab, fig.cap="Genetic variance of individual growth potential with a lognormal animal model."}
fitEnv <- list(NCI = list())
for(var in c("NCI")){
for(sim in list.files("save/EnvGeno",
pattern = var, full.names = T)){
load(sim)
fitEnv[[var]] <- c(fitEnv[[var]], fit)
}
}
fitEnv <- lapply(fitEnv, sflist2stanfit)
lapply(fitEnv, broom.mixed::tidyMCMC, c("mu", "Vp", "Vg", "Vr"),
droppars = NULL, rhat = T) %>%
bind_rows(.id = "Variable") %>%
separate(term, c("parameter", "population"), convert = T) %>%
mutate(population = recode(population, "1" = "S. globulifera Paracou",
"2" = "S. globulifera Regina", "3" = "S. sp1")) %>%
mutate(population = recode(population, "sp1" = "S. sp.1",
"globuliferaTypeParacou" = "S. sp.2" ,
"globuliferaTypeRegina" = "S. sp.3" )) %>%
rename(Species = population) %>%
mutate(Species = ifelse(is.na(Species), "", Species)) %>%
kable(caption = "Summary table of the kinship growth model",
col.names = c("Variable", "Parameter", "Species",
"Estimate", "$\\sigma$", "$\\hat{R}$"))
```
```{r envGenoTrace, fig.cap="Traceplot for environmental variables."}
cowplot::plot_grid(plotlist = lapply(fitEnv, mcmc_trace,
regex_pars = c("Vp", "Vg", "Vr"),
facet_args = list(nrow = 3)),
nrow = 1, labels = names(fitEnv))
```
```{r envgenoR2, fig.cap="R2 for environmental variable"}
lapply(fitEnv, pars = c("Vp", "Vg", "Vr"), as.data.frame) %>%
bind_rows(.id = "model") %>%
rowwise() %>%
mutate(Vtot = sum(c(Vp, Vg, Vr))) %>%
mutate(Vexp = sum(c(Vp, Vg))) %>%
mutate_at(c("Vp", "Vg", "Vexp"), funs(./Vtot)) %>%
dplyr::select(-Vtot, -Vr) %>%
reshape2::melt(id.vars = "model") %>%
group_by(model, variable) %>%
summarise(q5 = quantile(value, 0.05),
q25 = quantile(value, 0.25),
mean = mean(value),
median = median(value),
sd = sd(value),
q75 = quantile(value, 0.75),
q95 = quantile(value, 0.95)) %>%
mutate(variable = recode_factor(variable,
"Vexp" = "Marginal", "Vg" = "Genotype",
"Vp" = "Species")) %>%
ggplot(aes(x = variable, xend = variable, col = variable)) +
geom_point(aes(y = median), shape = 21, size = 3, alpha = 0.5) +
geom_segment(aes(y = q5, yend = q95),
size = 1, show.legend = F, alpha = 0.5) +
geom_segment(aes(y = q25, yend = q75), size = 2, alpha = 0.5) +
ylab(expression(R^2)) +
theme(axis.title.y = element_blank()) +
facet_wrap(~ model, nrow = 3) +
coord_flip()
```
```{r envGenoVarPart, fig.cap="Genetic variance partitioning for environmental variables."}
lapply(fitEnv, mcmc_intervals_data, regex_pars = c("Vp", "Vg", "Vr")) %>%
bind_rows(.id = "variable") %>%
mutate(parameter = recode(parameter, "Vp" = "Species", "Vg" = "Genotype", "Vr" = "Residual")) %>%
group_by(variable) %>%
mutate(pct = paste0(round(m / sum(m) * 100), "%")) %>%
ggplot(aes(x = variable, fill = parameter)) +
geom_col(aes(y = m)) +
geom_text(aes(y = m, label = pct), col = "white", position = position_stack(vjust = .5)) +
facet_wrap(~ variable, scales = "free") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank(), axis.line.x = element_blank(), axis.ticks.x = element_blank()) +
scale_fill_discrete(expression(sigma^2))
```
## Confounding neutral variation
If the observed genetic variation in relation to the environment is non-adaptive, i.e. confounded by neutral processes, the observed patterns should not differ from the randomized environmental maps across the study plots (Fig. \@ref(fig:plotncigif)), preserving the spatial structure of individuals while breaking their relationship with the environment. Instead, we observed a pattern significantly (P < XXX, Fig. \@ref(fig:vgdist)) different from the patterns observed randomizing environmental maps across plots. Thus, the relationship is adaptive, not neutral.
```{r ncimaps, eval=F}
library(gstat)
library(sf)
rm(list = ls())
src_sqlite(file.path("/home/sylvain//Documents/BIOGECO/PhD/data/Paracou/",
"trees", "Paracou.sqlite")) %>%
tbl("Paracou") %>%
filter(CodeAlive == 1) %>%
filter(Plot %in% 1:15) %>%
filter(CensusYear == 2015) %>%
collect() %>%
mutate(DBH = CircCorr/pi) %>%
left_join(data.frame(SubPlot = 1:4, Xcor = c(0, 125, 0, 125), Ycor = c(125, 125, 0, 0))) %>%
mutate(Xfield2 = Xfield - Xcor, Yfield2 = Yfield - Ycor) %>%
filter(Xfield2 >= 0, Xfield2 <= 125, Yfield2 >= 0, Yfield2 <= 125) %>%
write_tsv("save/paracou.tsv")
computeNCI <- function(P, d = 20){
dat <- read_tsv("save/paracou.tsv") %>%
filter(Plot == P)
D <- dist(dat[c("Xfield", "Yfield")]) %>%
as.matrix()
D[D>20] <- NA # individuals above 20 m
D[D == 0] <- NA # theimselves
D <- exp(-0.25*D)
D[is.na(D)] <- 0
dat$NCI <- as.vector(D %*% as.vector(dat$DBH))
return(dat %>%
filter(Xfield > local(d), Xfield < 250-local(d), Yfield > local(d), Yfield < 250-local(d)))
}
lapply(1:15, computeNCI) %>%
bind_rows() %>%
left_join(data.frame(SubPlot = 1:4, Xcor2 = c(20, 0, 20, 0), Ycor2 = c(0, 0, 20, 20))) %>%
mutate(Xfield3 = Xfield2 - Xcor2, Yfield3 = Yfield2 - Ycor2) %>%
write_tsv(path = "save/nci.tsv")
krige_stand <- function(name){
stand <- vroom::vroom("save/nci.tsv") %>%
mutate(stand = paste0("P", Plot, "C", SubPlot)) %>%
filter(stand == name)
stand.xy <- as_Spatial(st_as_sf(stand, coords = c("Xfield3", "Yfield3")))
stand.xy <- stand.xy[-sp::zerodist(stand.xy)[,1],]
grd <- expand.grid(Xfield3 = seq(0, 105, length.out = 106), Yfield3 = seq(0, 105, length.out = 106)) %>%
st_as_sf(coords = c("Xfield3", "Yfield3")) %>%
as_Spatial()
stand.vgm <- variogram(NCI ~ 1, stand.xy)
stand.fit <- fit.variogram(stand.vgm, model = vgm("Sph"))
stand.kriged <- krige((NCI) ~ 1, stand.xy, grd, model = stand.fit)
stand.kriged %>%
as.data.frame %>%
rename(Xfield = coords.x1, Yfield = coords.x2, NCI = var1.pred)
}
stands <- vroom::vroom("save/nci.tsv") %>%
mutate(stand = paste0("P", Plot, "C", SubPlot)) %>%
dplyr::select(stand) %>%
collect() %>%
unique() %>%
unlist()
names(stands) <- stands
lapply(stands, krige_stand) %>%
bind_rows(.id = "stand") %>%
rename(Xfield3 = Xfield, Yfield3 = Yfield, NCI.var = var1.var) %>%
write_tsv("data/ncikriged.tsv")
rotateTab <- function(tab){
tab0 <- mutate(tab, stand = paste0(stand, "-0"))
tabX <- mutate(tab, stand = paste0(stand, "-flipX")) %>%
mutate(Xfield3 = abs(105-Xfield3))
tabY <- mutate(tab, stand = paste0(stand, "-flipY")) %>%
mutate(Yfield3 = abs(105-Yfield3))
tabXY <- mutate(tab, stand = paste0(stand, "-flipXY")) %>%
mutate(Yfield3 = abs(Yfield3-105), Xfield3 = abs(Xfield3-105))
return(bind_rows(tab0, tabX, tabY, tabXY))
}
rotateTab(vroom::vroom("data/ncikriged.tsv")) %>%
write_tsv("data/ncikrigedrotated.tsv")
```
```{r plotnci, eval=F}
library(gganimate)
g <- vroom::vroom("data/ncikrigedrotated.tsv") %>%
ggplot(aes(Xfield3, Yfield3, fill = NCI)) +
geom_raster() +
coord_equal() +
viridis::scale_fill_viridis(direction = -1) +
transition_manual(stand) +
labs(title = 'Stand: {current_frame}')
anim_save("images/nci.gif", g)
```
```{r plotncigif, fig.cap="NCI in 2015 across SubPlots."}
include_graphics("images/nci.gif")
```
```{r EnvmdataNull, eval=F}
path <- "data/Symphonia_Paracou/Sequences/populationGenomics/"
paracou <- read_tsv("save/paracou.tsv") %>%
mutate(Ind = paste0("P", Plot, "-", SubPlot, "-", TreeFieldNum)) %>%
mutate(stand = paste0("P", Plot, "C", SubPlot, "-0")) %>%
left_join(data.frame(SubPlot = 1:4, Xcor2 = c(20, 0, 20, 0), Ycor2 = c(0, 0, 20, 20))) %>%
mutate(Xfield3 = Xfield2 - Xcor2, Yfield3 = Yfield2 - Ycor2) %>%
dplyr::select(stand, Ind, Xfield3, Yfield3)
trees <- read_tsv(file = "save/NCI.tsv") %>%
dplyr::select(-NCI) %>%
left_join(paracou) %>%
filter(Xfield3 > 0, Yfield3 > 0, Xfield3 < 105, Yfield3 < 150) %>%
mutate(Xfield3 = round(Xfield3), Yfield3 = round(Yfield3))
nci <- vroom::vroom("data/ncikrigedrotated.tsv")
ids <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king.id"))
K <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king"),
col_names = ids$IID) %>%
as.data.frame()
row.names(K) <- ids$IID
K <- as.matrix(K)
trees <- trees %>%
filter(IID %in% colnames(K)) %>%
mutate(IndNum = 1:nrow(.), popNum = as.numeric(as.factor(pop)))
N <- 100
mdata <- lapply(0:N, function(sim) {
if(sim == 0) {
dat <- left_join(trees, nci,
by = c("Xfield3", "Yfield3"),
suffix = c(".true", ".null")) %>%
filter(stand.true == stand.null)
} else {
dat <- left_join(trees, nci,
by = c("Xfield3", "Yfield3"),
suffix = c(".true", ".null")) %>%
filter(stand.true != stand.null) %>%
group_by(stand.true) %>%
filter(stand.null == sample(unique(stand.null), 1))
}
K <- K[dat$IID, dat$IID]
K[K < 0] <- 0
K <- K*2
K <- as.matrix(Matrix::nearPD(K)$mat)
return(list(N = nrow(dat),
P = length(unique(dat$pop)),
y = as.vector(scale(dat$NCI, center = F)),
population = dat$popNum,
K = K))
})
names(mdata) <- 0:N
# str(mdata)
save(mdata, file = file.path("save", 'dataEnvNull.Rdata'))
```
```{bash envGenoNullCluster, eval=F}
for sim in $(seq 0 100) ; do echo "module purge ; module load compiler/gcc-7.2.0 ; module load system/R-3.5.3 ; R_LIBS_USER=\" \" Rscript EnvGenoNull.R $sim" ; done > EnvGenoNull.sh
sarray -J EnvGenoNull -o out/%j.EnvGenoNull.out -e out/%j.EnvGenoNull.err -t 48:00:00 --constraint=broadwell --cpus-per-task=1 --mail-type=BEGIN,END,FAIL EnvGenoNull.sh
watch 'squeue -u sschmitt | wc -l'
watch 'tail -n 1 out/*.out | grep Chain'
```
```{r envGenoNullGraph, fig.cap="Among-genotype variance (VG) between observed value (red) and values observed in null models (black) randomizing neighbourhood crowding index (NCI) among plots."}
fitEnvNull <- list()
sims <- gsub(".Rdata", "", list.files("save/EnvGenoNull"))
for(sim in sims){
load(paste0("save/EnvGenoNull/", sim, ".Rdata"))
fitEnvNull <- c(fitEnvNull, fit)
}
names(fitEnvNull) <- sims
vgs <- lapply(fitEnvNull, function(fit) try(broom.mixed::tidyMCMC(fit, c("Vg"), droppars = NULL, rhat = T)))
vgs <- bind_rows(vgs[lapply(vgs, class) != "try-error"], .id = "simulation")
ggplot(vgs, aes(estimate)) +
geom_histogram(fill = "lightgrey", binwidth = 0.001) +
geom_density(aes(y = 0.001*..count..), fill = NA) +
geom_vline(xintercept = 0.021, col = "red") +
geom_text(label = "P < 0.01", y = 20, data = data.frame(estimate = 0.01)) +
xlab(expression(V[G]))
```
## Closely-related species
If the simultaneous inclusion of the three closely related species bias the analysis, we should not observe similar patterns inferring the analyses for each species separately. Instead, we observed genotypic variation to be related to the environment for all species together or independently ($\frac{\sigma^2_G}{\sigma^2_P} \in [0.15-0.76]$, Fig. S\@ref(fig:envGenoSpeciesVarPart)).
```{r EnvmSpdata, eval=F}
trees <- read_tsv(file = "save/NCI.tsv")
species <- unique(trees$pop)
ids <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king.id"))
K <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king"),
col_names = ids$IID) %>%
as.data.frame()
row.names(K) <- ids$IID
K <- as.matrix(K)
trees <- trees %>%
filter(IID %in% colnames(K))
mdata <- lapply(species, function(sp){
trees_sp <- filter(trees, pop == sp) %>%
mutate(IndNum = 1:nrow(.))
K_sp <- K[trees_sp$IID, trees_sp$IID]
K_sp[K_sp < 0] <- 0
K_sp <- K_sp*2
K_sp <- as.matrix(Matrix::nearPD(K_sp)$mat)
return(list(N = nrow(trees_sp),
P = 1,
y = as.vector(scale(trees_sp$NCI, center = F)),
population = rep(1, nrow(trees_sp)),
K = K_sp))})
names(mdata) <- species
save(mdata, file = file.path("save", 'dataEnvSp.Rdata'))
```
```{r envGenoSpFit, eval=F}
load(file.path("save", 'dataEnvSp.Rdata'))
model <- stan_model("models/AnimalLog.stan")
fit <- lapply(mdata, function(sp)
sampling(model, chains = 4, data = sp, save_warmup = F,
control = list(adapt_delta = 0.99, max_treedepth = 12)))
names(fit) <- names(mdata)
save(fit, file = file.path("save", 'fitEnvSp.Rdata'))
```
```{r envGenoSpeciesTab, fig.cap="Genetic variance of individual growth potential with a lognormal animal model."}
load(file.path("save", 'fitEnvSp.Rdata'))
lapply(fit, broom.mixed::tidyMCMC, c("mu", "Vp", "Vg", "Vr"),
droppars = NULL, rhat = T) %>%
bind_rows(.id = "Species") %>%
mutate(Species = recode(Species, "sp1" = "S. sp.1",
"globuliferaTypeParacou" = "S. sp.2" ,
"globuliferaTypeRegina" = "S. sp.3" )) %>%
separate(term, c("parameter", "population"), convert = T) %>%
dplyr::select(-population) %>%
kable(caption = "Summary table of the kinship growth model",
col.names = c("Variable", "Parameter",
"Estimate", "$\\sigma$", "$\\hat{R}$"))
```
```{r envGenoSpeciesTrace, fig.cap="Traceplot for environmental variables."}
cowplot::plot_grid(plotlist = lapply(fit, mcmc_trace,
regex_pars = c("Vg", "Vr"),
facet_args = list(nrow = 3)),
nrow = 1, labels = c("S. sp.1", "S. sp.2" , "S. sp.3"))
```
```{r envgenoSpeciesR2, fig.cap="R2 for environmental variable"}
lapply(fit, pars = c("Vg", "Vr"), as.data.frame) %>%
bind_rows(.id = "model") %>%
rowwise() %>%
mutate(Vtot = sum(c(Vg, Vr))) %>%
mutate(Vexp = sum(c(Vg))) %>%
mutate_at(c("Vg", "Vexp"), funs(./Vtot)) %>%
dplyr::select(-Vtot, -Vr) %>%
reshape2::melt(id.vars = "model") %>%
group_by(model, variable) %>%
summarise(q5 = quantile(value, 0.05),
q25 = quantile(value, 0.25),
mean = mean(value),
median = median(value),
sd = sd(value),
q75 = quantile(value, 0.75),
q95 = quantile(value, 0.95)) %>%
mutate(variable = recode_factor(variable,
"Vexp" = "Marginal", "Vg" = "Genotype")) %>%
mutate(model = recode(model, "sp1" = "S. sp.1",
"globuliferaTypeParacou" = "S. sp.2" ,
"globuliferaTypeRegina" = "S. sp.3" )) %>%
ggplot(aes(x = variable, xend = variable, col = variable)) +
geom_point(aes(y = median), shape = 21, size = 3, alpha = 0.5) +
geom_segment(aes(y = q5, yend = q95),
size = 1, show.legend = F, alpha = 0.5) +
geom_segment(aes(y = q25, yend = q75), size = 2, alpha = 0.5) +
ylab(expression(R^2)) +
theme(axis.title.y = element_blank()) +
facet_wrap(~ model, nrow = 3) +
coord_flip()
```
```{r envGenoSpeciesVarPart, fig.cap="Variance partitioning for neighbourhood crowding index (NCI), an indirect measure of access to light, for each species. Variation of each variable has been partitioned into among-genotype (red), and residual (blue) variations."}
lapply(fit, mcmc_intervals_data, regex_pars = c("Vg", "Vr")) %>%
bind_rows(.id = "variable") %>%
mutate(parameter = recode(parameter, "Vg" = "Genotype", "Vr" = "Residual")) %>%
group_by(variable) %>%
mutate(pct = paste0(round(m / sum(m) * 100), "%")) %>%
mutate(variable = recode(variable, "sp1" = "S. sp.1",
"globuliferaTypeParacou" = "S. sp.2" ,
"globuliferaTypeRegina" = "S. sp.3" )) %>%
ggplot(aes(x = variable, fill = parameter)) +
geom_col(aes(y = m)) +
geom_text(aes(y = m, label = pct), col = "white", position = position_stack(vjust = .5)) +
facet_wrap(~ variable, scales = "free") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank(), axis.line.x = element_blank(), axis.ticks.x = element_blank()) +
scale_fill_discrete(expression(sigma^2))
```
<!-- ## Isolation by environement (IBE) -->
<!-- > Je pense qu'il faut trouver une manière de présenter la méthode et les résultats plus facile à suivre, notamment pour ceux qui n'ont pas un bon background en génétique quanti moderne. Personnellement je serais aussi un peu plus prudent dans les interprétations car il y a peut-être d'autres mécanismes qui pourraient conduire aux patterns observés. Je pense notamment à de la dispersion de graines qui se feraient préférentiellement entre microhabitats similaires (peut-être à vérifier en comparant l'IBD à l'IBE). A moins que nos discussions finissent par me convaincre tout à fait, j'ai l'impression que tu as des éléments probants pour échafauder des hypothèses mais pas de preuves formelles. C'est là que proposer des expérimentations pour tester ces hypothèses serait peut être une plus-value. *O. Hardy* -->
<!-- ```{r} -->
<!-- paracou <- src_sqlite(file.path("data", "Paracou","trees", "Paracou.sqlite")) %>% -->
<!-- tbl("Paracou") %>% -->
<!-- filter(Genus == "Symphonia") %>% -->
<!-- filter(CensusYear == 2015) %>% -->
<!-- collect() %>% -->
<!-- mutate(Ind = paste0("P", Plot, "-", SubPlot, "-", TreeFieldNum)) %>% -->
<!-- dplyr::select(Ind, Xutm, Yutm) -->
<!-- trees <- read_tsv(file = "save/NCI.tsv") %>% -->
<!-- left_join(paracou) -->
<!-- ids <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king.id")) -->
<!-- K <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king"), -->
<!-- col_names = ids$IID) %>% -->
<!-- as.data.frame() -->
<!-- row.names(K) <- ids$IID -->
<!-- K <- as.matrix(K) -->
<!-- trees <- trees %>% -->
<!-- filter(IID %in% colnames(K)) %>% -->
<!-- mutate(IndNum = 1:nrow(.), popNum = as.numeric(as.factor(pop))) -->
<!-- # all(trees$IID == colnames(K)) -->
<!-- G <- K -->
<!-- D <- dist(trees[c("Xutm", "Yutm")]) %>% -->
<!-- as.matrix() -->
<!-- E <- dist(trees[c("NCI")]) %>% -->
<!-- as.matrix() -->
<!-- ib <- read_tsv("../PhD/data/Symphonia_Paracou/Sequences/variantCalling/spagedi/globuliferaTypeParacou.1k.spagedi.out.ibd") %>% -->
<!-- rename(IID1 = "Name i", IID2 = "Name j", distance = "Spatial dist", kinship = "ALL LOCI", moran = "ALL LOCI_1") %>% -->
<!-- mutate(distance = ifelse(distance == "inbreeding coef", 0, distance)) %>% -->
<!-- mutate(distance = as.numeric(distance)) %>% -->
<!-- filter(distance < 10^5) %>% -->
<!-- dplyr::select(IID1, IID2, distance, kinship) %>% -->
<!-- left_join(reshape2::melt(E) %>% -->
<!-- rename(IID1 = Var1, IID2 = Var2, nci = value)) -->
<!-- ib %>% -->
<!-- reshape2::melt(id.vars = c("IID1", "IID2", "kinship")) %>% -->
<!-- ggplot(aes(value, kinship)) + -->
<!-- geom_point() + -->
<!-- geom_smooth(method = "lm") + -->
<!-- facet_wrap(~ variable, scales = "free", nrow = 2) -->
<!-- ib.full <- bind_rows(ib, -->
<!-- ib %>% -->
<!-- rename(IID3 = IID1) %>% -->
<!-- rename(IID1 = IID2, IID2 = IID3)) %>% -->
<!-- unique() -->
<!-- vegan::mantel(as.matrix(reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "kinship")[-1]), -->
<!-- as.matrix(reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "distance")[-1])) -->
<!-- reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "distance") %>% -->
<!-- filter_if(is.na) -->
<!-- as.matrix(reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "kinship")) -->
<!-- ib.full %>% -->
<!-- filter() -->
<!-- mantel.rtest(as.matrix(reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "kinship")[-1]), -->
<!-- as.matrix(reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "distance")[-1])) -->
<!-- data.frame(G = as.vector(G), D = as.vector(D), E = as.vector(E)) %>% -->
<!-- cor() %>% -->
<!-- corrplot::corrplot.mixed() -->
<!-- ``` -->