07-NCI.Rmd

```{r setup_nci, include=FALSE}
rm(list = ls()) ; invisible(gc()) ; set.seed(42)
library(knitr)
library(kableExtra)
if(knitr:::is_html_output()) options(knitr.table.format = "html") 
if(knitr:::is_latex_output()) options(knitr.table.format = "latex") 
library(tidyverse)
library(raster)
library(bayesplot)
library(kinship2)
library(rstan)
theme_set(bayesplot::theme_default())
opts_chunk$set(
  echo = F, message = F, warning = F, fig.height = 6, fig.width = 8,
  cache = F, cache.lazy = F)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = T)
path <- "data/Symphonia_Paracou/Sequences/populationGenomics/"
```


# Neighbourhood crowding effect on neutral and adaptive genetic variation

We did environmental association analyses [@Rellstab2015] in each complex using general linear mixed models developed for genome wide association studies (GWAS). 
We used the mean neighbourhood crowding index [$NCI$; @Uriarte2004] over the last 30 years,
an indirect measurement of access to light and forest gap dynamics, 
as the response variable and genetic structure (gene pools representing species) and relatedness (kinship matrix) as explanatory variables, as it is common practice [@Rellstab2015]. 
This analysis assumed that the neighbourhood crowding conditions where individuals have grown above 10-cm DBH are strongly correlated to the individual heritable phenotypes [e.g. @Eckert2010]. 
The mean neighbourhood crowding index $NCI_i$ from tree individual $i$ was calculated as follows:

$$NCI_i=\overline{\sum_{j|\delta_{i,j}<20m}DBH^2_{j,t}.e^{-\frac14\delta_{i,j}}}$$

with $DBH_{j,t}$ the diameter of the neighbouring tree $j$ in year $t$ and $\delta_{i,j}$ its distance to the individual tree $i$.
$NCI_i$ is computed for all neighbours at a distance $\delta_{i,j}$ inferior to the maximum neighbouring distance of 20 meters.
The power of neighbours $DBH_{j,t}$ effect was set to 2 to represent a surface. 
The decrease of neighbours' diameter effect with distance was set to -0.25 to represent trees at 20 meters of the focal trees having 1% of the effect of the same tree at 0 meters. 
$NCI_i$ is computed as the mean of yearly $NCI_{i,t}$ over the last 30 years denoted by the overline.

We used genetic species and individual kinship in an animal model [@Wilson2010] to estimate genetic variance associated with neighbourhood crowding index.
We used a lognormal likelihood given that distributions of environmental variables were positive and skewed.
We inferred individual kinship using KING [@Manichaikul2010], as the method is robust to population structure. 
We set negative kinship values to null as they were confounding with population structure, and we further ensured that the matrix was positive-definite using the `nearPD`  function from the R package `Matrix`.
The environment $y_{s,i}$ where individual $i$ in species $s$ grows was inferred with a lognormal distribution with the following formula:

$$y_{s,i} \sim logN(log(\mu_s.a_{i}),\sigma^2_1)$$
$$a_{i} \sim MVlogN_N(log(1),\sigma^2_2.K)$$

where $\mu_s$ is the mean environment of species $s$, $a_i$ is the breeding value of the individual $i$ and $\sigma^2_1$ is the shape parameter of the lognormal. 
Individual breeding values $a_i$ are defined following a multivariate lognormal law $\mathcal{MVlogN}$ of co-shape matrix defined as the product of the kinship matrix $K$ with estimated individual genotypic variation $\sigma^2_2$.
To estimate variances on a normal scale, we log-transformed species fixed effect, genetic additive values, and we calculated conditional and marginal $R^2$ [@Nakagawa2013].
A Bayesian method was used to infer parameters using `stan` language [@Carpenter2017] and the `rstan` package [@StanDevelopmentTeam2018] in the R environment [@RCoreTeam2020] using the No-U-Turn Sampler algorithm [NUTS, @Hoffman2014], which performs better for estimating genetic parameters and breeding values [@Nishio2019].

## Simulated Animal Model

The aim of this subchapter is to explore the animal model with generated data to validate their behaviour and use it on *Symphonia* real data. 
Let's consider a set of $P=3$ populations including each $Fam=3$ families composed of $I = 14$ individuals with arbitrary relationships (it's only 126 individuals to do quick tests).

```{r simkinship, fig.cap="Kinship matrix"}
simK <- function(
  P = 3, 
  Fam = 3
){
  I <- 14 # Individuals
  ped.df <- data.frame(pop = rep(1:P, each = Fam*I),
                       fam = rep(1:Fam, each = I),
                       ind = rep(1:I, Fam*P),
                       father = rep(c(NA, NA, 1, 1, 1, 1, NA, 
                                      6, 6, 6, NA, 11, 11, 11), Fam*P),
                       mother = rep(c(NA, NA, 2, 2, 2, 2, NA, 
                                      7, 7, 7, NA, 3, 3, 3), Fam*P), 
                       sex = rep(c(1, 2, 2, 3, 3, 1, 2, 3, 3, 
                                   3, 1, 3, 3, 3), Fam*P)) %>% 
    mutate_at(c("father", "mother"), funs(ifelse(!is.na(.), 
                                                 paste0(pop,fam,.), NA))) %>%
    mutate(ind = paste0(pop,fam,ind))
  ped.ped <- kinship2::pedigree(id = ped.df$ind, dadid = ped.df$father, 
                                momid = ped.df$mother, sex = ped.df$sex, 
                                famid = paste0(ped.df$pop, ped.df$fam))
  K <- as.matrix(kinship2::kinship(ped.ped))
  K <- 2*K
  return(list(df = ped.df, K = K))
}
heatmap(simK()$K)
```

We used the following animal model with a lognormal distribution to estimate population and genotypic variance:

\begin{equation} 
  y_{p,i} \sim \mathcal{logN}(log(\mu_p.a_{i}),\sigma_1) \\
  a_{p,i} \sim \mathcal{MVlogN_N}(log(1),\sigma_2.K)
  (\#eq:animallogth)
\end{equation} 

```{r simanimal}
simAnimal <- function(
  P = 3, 
  Fam = 3,
  sigmaP = 0.5,
  sigmaG = 0.3,
  sigmaR = 0.2
){
  K <- simK(P, Fam)
  N <- nrow(K$K)
  epsilon <- rnorm(N)
  mu <- rlnorm(P, sd = sigmaP)
  y <- rlnorm(N, 
              log(mu[K$df$pop]) + sigmaG*as.vector(t(chol(K$K)) %*% epsilon), 
              sigmaR)
  mdata <- list(N = N, P = P, K = K$K, population = K$df$pop, y = y)
  return(list(mu = mu,
              Vp = var(log(mu[K$df$pop])), 
              Vg = var(sigmaG*as.vector(t(chol(K$K)) %*% epsilon)), 
              Vr = sigmaR^2,
              mdata = mdata))
}
```

We fitted the equivalent model with the following priors:

\begin{equation} 
  y_{p,i} \sim \mathcal{logN}(log(\mu_p.\hat{a_{i}}), \sigma_1) \\ 
  \hat{a_{i}} = e^{\sqrt{V_G}.A.\epsilon_i} \\
  \epsilon_i \sim \mathcal{N}(0,1) \\ 
  ~ \\
  \mu_p \sim \mathcal{logN}(log(1),1) \\  
  \sigma_1 \sim \mathcal N_T(0,1) \\ 
  ~ \\
  V_Y = Var(log(y)) \\
  V_P = Var(log(\mu_p)) \\
  V_R=\sigma_1^2 \\
  V_G = V_Y - V_P - V_R \\
  (\#eq:animallogthstan)
\end{equation} 

```{r animalTable}
# mdata <- simAnimal(P = 3, Fam = 3)
# animal <- stan_model("symcapture_models/AnimalLog.stan")
# fitAnimal <- sampling(animal, chains = 2, save_warmup = F, data = mdata$mdata,
#                       init = "0",
#                       control = list(adapt_delta = 0.99, max_treedepth = 12))
# save(mdata, fitAnimal, file = file.path("symcapture_save", "animal.Rdata"))
load(file.path("save", "animal.Rdata"))
broom.mixed::tidyMCMC(fitAnimal, pars = c("mu", "Vp", "Vg", "Vr", "lp__"), 
                droppars = NULL, rhat = T) %>% 
  mutate(expected = with(mdata, c(mu, Vp, Vg, Vr))) %>% 
  dplyr::select(term, estimate, expected, std.error, rhat) %>% 
  kable(caption = "Animal model fitted versus expected values.",
        col.names = c("Parameter", "Estimate", "Expected",
                      "Standard error", "$\\hat R$"))
```

```{r animalTrace, fig.cap="Parameters for the Animal model: trace plot and expected value in red."}
mcmc_trace(as.array(fitAnimal, pars = c("mu", "Vp", "Vg", "Vr", "lp__")),
           np = nuts_params(fitAnimal)) +
  geom_hline(aes(yintercept = expected), col = "red", 
             data = data.frame(parameter = c(paste0("mu[", 1:length(mdata$mu), "]"), 
                                             "Vp", "Vg", "Vr"), 
                               expected = with(mdata, c(mu, Vp, Vg, Vr)))) 
```

## Neighbourhood crowding index

```{r NCI, eval=F, echo=T}
trees <- src_sqlite(file.path("data", "Paracou","trees", "Paracou.sqlite")) %>% 
  tbl("Paracou") %>% 
  filter(Genus == "Symphonia") %>% 
  filter(CensusYear == 2015) %>% 
  collect()
trees <- read_tsv(file.path(path, "..", "variantCalling", "paracou3pop",
                                                  "symcapture.all.biallelic.snp.filtered.nonmissing.paracou3pop.fam"),
         col_names = c("FID", "IID", "FIID", "MIID", "sex", "phenotype")) %>% 
  mutate(Ind = gsub(".g.vcf", "", IID)) %>% 
  mutate(X = gsub("P", "", Ind)) %>% 
  separate(X, c("Plot", "SubPlot", "TreeFieldNum"), convert = T) %>% 
  left_join(trees) %>% 
  left_join(read_tsv(file.path(path, "bayescenv", "paracou3pop.popmap"),
                     col_names = c("IID", "pop")))
cl <- parallel::makeCluster(getOption("cl.cores", 4))
parallel::clusterExport(cl, list("trees"))
NC <- parallel::parLapply(cl, 1:nrow(trees), function(ind){
  library(tidyverse)
  src_sqlite(file.path("data", "Paracou", "trees", "Paracou.sqlite")) %>% 
    tbl("Paracou") %>% 
    filter(Plot == local(trees$Plot[ind])) %>% 
    filter(idTree != local(trees$idTree[ind])) %>% 
    mutate(dij = sqrt((local(trees$Xutm[ind]) - Xutm)^2+(local(trees$Yutm[ind]) - Yutm)^2)) %>% 
    filter(dij < 20) %>% 
    mutate(con = ifelse(Genus == local(trees$Genus[ind]) && Species == local(trees$Species[ind]), 1, 0)) %>% 
    mutate(DBH = CircCorr/pi) %>% 
    collect() %>% 
    group_by(CensusYear) %>% 
    summarise(NCI = sum(DBH*DBH*exp(-0.25*dij))) %>% 
    ungroup() %>% 
    summarise(idTree = local(trees$idTree[ind]),
              NCI = mean(NCI))})
parallel::stopCluster(cl) ; rm(cl)
NC <- bind_rows(NC)
trees <- left_join(trees, NC) %>% 
  dplyr::select(IID, Ind, pop, NCI)
rm(NC)
write_tsv(trees, file = "save/NCI.tsv")
```

```{r Envmdata}
trees <- read_tsv(file = "save/NCI.tsv")
ids <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king.id"))
K <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king"),
         col_names = ids$IID) %>% 
  as.data.frame()
row.names(K) <- ids$IID
K <- as.matrix(K)
trees <- trees %>% 
  filter(IID %in% colnames(K)) %>% 
  mutate(IndNum = 1:nrow(.), popNum = as.numeric(as.factor(pop)))
mdata <- lapply(c("NCI"), function(variable) {
                K <- K[trees$IID, trees$IID]
                K[K < 0] <- 0
                K <- K*2
                K <- as.matrix(Matrix::nearPD(K)$mat)
                return(list(N = nrow(trees),
                            P = length(unique(trees$pop)),
                            y = as.vector(scale(trees[variable], center = F)),
                                          population = trees$popNum,
                                          K = K))})
names(mdata) <- c("NCI")
mdata$RE$y <- mdata$RE$y + 1 # for the lognormal
save(mdata, file = file.path("save", 'dataEnv.Rdata'))   
```

## Genetic variance

We used between individual kinship and a lognormal Animal model [@Wilson2010] to estimate genetic variance associated to individuals' global phenotype living in a given environment [see environmental association analyses with genome wide association study analyses in @Rellstab2015]. 
The animal model is calculated for the environmental values $y$ of the $N$ individuals with the following formula:

\begin{equation} 
  y_{p,i} \sim \mathcal{logN}(log(a_{p,i}),\sigma_1) \\
  a_{p,i} \sim \mathcal{MVlogN_N}(log(\mu_p),\sigma_2.K)
  (\#eq:animalenv)
\end{equation} 

where individual is defined as a normal law centered on the individual genetic additive effects $a$ and associated individual remaining variance $\sigma_R$. Additive genetic variance $a$ follows a multivariate lognormal law centered on the population mean $\mu_{Population}$ of covariance $\sigma_G K$.

We fitted the equivalent model with following priors:

\begin{equation} 
  y_{p,i} \sim \mathcal{logN}(log(\mu_p) + \hat{\sigma_2}.A.\epsilon_i, \sigma_1) \\ 
  \epsilon_i \sim \mathcal{N}(0,1) \\ 
  ~ \\
  \mu_p \sim \mathcal{logN}(log(1),1) \\  
  \sigma_1 \sim \mathcal N_T(0,1) \\ 
  \hat{\sigma_2} = \sqrt(V_G)
  ~ \\
  V_Y = Var(log(y)) \\
  V_P = Var(log(\mu_p)) \\
  V_G = V_Y - V_P - V_R \\
  V_R=\sigma_1^2
  (\#eq:animalenvstan)
\end{equation} 

```{bash envGenoCluster, eval=F}
vars=(NCI)
for var in "${vars[@]}" ; do for chain in $(seq 8) ; do echo "module purge ; module load compiler/gcc-7.2.0 ; module load system/R-3.5.3 ; R_LIBS_USER=\" \" Rscript EnvGeno.R $chain $var" ; done ; done > EnvGeno.sh
sarray -J Env -o out/%j.Env.out -e out/%j.Env.err -t 48:00:00 --constraint=broadwell --cpus-per-task=1 --mail-type=BEGIN,END,FAIL EnvGeno.sh
```

```{r envGenoTab, fig.cap="Genetic variance of individual growth potential with a lognormal animal model."}
fitEnv <- list(NCI = list())
for(var in c("NCI")){
  for(sim in list.files("save/EnvGeno", 
                        pattern = var, full.names = T)){
    load(sim)
    fitEnv[[var]] <- c(fitEnv[[var]], fit)
  }
}
fitEnv <- lapply(fitEnv, sflist2stanfit)
lapply(fitEnv, broom.mixed::tidyMCMC, c("mu", "Vp", "Vg", "Vr"), 
       droppars = NULL, rhat = T) %>% 
  bind_rows(.id = "Variable") %>% 
  separate(term, c("parameter", "population"), convert = T) %>% 
  mutate(population = recode(population, "1" = "S. globulifera Paracou",
                             "2" = "S. globulifera Regina", "3" = "S. sp1")) %>% 
  mutate(population = recode(population, "sp1" = "S. sp.1",
                             "globuliferaTypeParacou" = "S. sp.2" ,
                             "globuliferaTypeRegina" = "S. sp.3" )) %>%
  rename(Species = population) %>% 
  mutate(Species = ifelse(is.na(Species), "", Species)) %>% 
  kable(caption = "Summary table of the kinship growth model",
        col.names = c("Variable", "Parameter",  "Species",  
                      "Estimate", "$\\sigma$", "$\\hat{R}$"))   
```

```{r envGenoTrace, fig.cap="Traceplot for environmental variables."}
cowplot::plot_grid(plotlist = lapply(fitEnv, mcmc_trace, 
                                     regex_pars = c("Vp", "Vg", "Vr"), 
                                     facet_args = list(nrow = 3)),
                   nrow = 1, labels = names(fitEnv))
```

```{r envgenoR2, fig.cap="R2 for environmental variable"}
lapply(fitEnv, pars = c("Vp", "Vg", "Vr"), as.data.frame) %>% 
  bind_rows(.id = "model") %>% 
  rowwise() %>% 
  mutate(Vtot = sum(c(Vp, Vg, Vr))) %>% 
  mutate(Vexp = sum(c(Vp, Vg))) %>% 
  mutate_at(c("Vp", "Vg", "Vexp"), funs(./Vtot)) %>% 
  dplyr::select(-Vtot, -Vr) %>% 
  reshape2::melt(id.vars = "model") %>% 
  group_by(model, variable) %>% 
  summarise(q5 = quantile(value, 0.05),
            q25 = quantile(value, 0.25),
            mean = mean(value),
            median = median(value),
            sd = sd(value),
            q75 = quantile(value, 0.75),
            q95 = quantile(value, 0.95)) %>% 
    mutate(variable = recode_factor(variable, 
                           "Vexp" = "Marginal", "Vg" = "Genotype", 
                           "Vp" = "Species")) %>% 
  ggplot(aes(x = variable, xend = variable, col = variable)) +
  geom_point(aes(y = median), shape = 21, size = 3, alpha = 0.5) +
  geom_segment(aes(y = q5, yend = q95),
               size = 1, show.legend = F, alpha = 0.5) +
  geom_segment(aes(y = q25, yend = q75), size = 2, alpha = 0.5) +
  ylab(expression(R^2)) +  
  theme(axis.title.y = element_blank()) +
  facet_wrap(~ model, nrow = 3) +
  coord_flip()
```

```{r envGenoVarPart, fig.cap="Genetic variance partitioning for environmental variables."}
lapply(fitEnv, mcmc_intervals_data, regex_pars = c("Vp", "Vg", "Vr")) %>% 
  bind_rows(.id = "variable") %>% 
  mutate(parameter = recode(parameter, "Vp" = "Species", "Vg" = "Genotype", "Vr" = "Residual")) %>% 
  group_by(variable) %>%
  mutate(pct = paste0(round(m / sum(m) * 100), "%")) %>%
  ggplot(aes(x = variable, fill = parameter)) +
  geom_col(aes(y = m)) +
  geom_text(aes(y = m, label = pct), col = "white", position = position_stack(vjust = .5)) +
  facet_wrap(~ variable, scales = "free") +
  theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
        axis.text.x = element_blank(), axis.line.x = element_blank(), axis.ticks.x = element_blank()) +
  scale_fill_discrete(expression(sigma^2))
```

## Confounding neutral variation

If the observed genetic variation in relation to the environment is non-adaptive, i.e. confounded by neutral processes, the observed patterns should not differ from the randomized environmental maps across the study plots (Fig. \@ref(fig:plotncigif)), preserving the spatial structure of individuals while breaking their relationship with the environment. Instead, we observed a pattern significantly (P < XXX, Fig. \@ref(fig:vgdist)) different from the patterns observed randomizing environmental maps across plots. Thus, the relationship is adaptive, not neutral.


```{r ncimaps, eval=F}
library(gstat)
library(sf)
rm(list = ls())
src_sqlite(file.path("/home/sylvain//Documents/BIOGECO/PhD/data/Paracou/",
                              "trees", "Paracou.sqlite")) %>% 
  tbl("Paracou") %>% 
  filter(CodeAlive == 1) %>% 
  filter(Plot %in% 1:15) %>% 
  filter(CensusYear == 2015) %>% 
  collect() %>% 
  mutate(DBH = CircCorr/pi) %>% 
  left_join(data.frame(SubPlot = 1:4, Xcor = c(0, 125, 0, 125), Ycor = c(125, 125, 0, 0))) %>% 
  mutate(Xfield2 = Xfield - Xcor, Yfield2 = Yfield - Ycor) %>% 
  filter(Xfield2 >= 0, Xfield2 <= 125, Yfield2 >= 0, Yfield2 <= 125) %>% 
  write_tsv("save/paracou.tsv")
computeNCI <- function(P, d = 20){
  dat <- read_tsv("save/paracou.tsv") %>% 
    filter(Plot == P)
  D <- dist(dat[c("Xfield", "Yfield")]) %>% 
    as.matrix()
  D[D>20] <- NA # individuals above 20 m
  D[D == 0] <- NA # theimselves
  D <- exp(-0.25*D)
  D[is.na(D)] <- 0
  dat$NCI <- as.vector(D %*% as.vector(dat$DBH))
  return(dat %>% 
           filter(Xfield > local(d), Xfield < 250-local(d), Yfield > local(d), Yfield < 250-local(d)))
}
lapply(1:15, computeNCI) %>% 
  bind_rows() %>% 
  left_join(data.frame(SubPlot = 1:4, Xcor2 = c(20, 0, 20, 0), Ycor2 = c(0, 0, 20, 20))) %>% 
  mutate(Xfield3 = Xfield2 - Xcor2, Yfield3 = Yfield2 - Ycor2) %>% 
  write_tsv(path = "save/nci.tsv")
krige_stand <- function(name){
  stand <- vroom::vroom("save/nci.tsv") %>% 
    mutate(stand = paste0("P", Plot, "C", SubPlot)) %>% 
    filter(stand == name)
  stand.xy  <- as_Spatial(st_as_sf(stand, coords = c("Xfield3", "Yfield3")))
  stand.xy <- stand.xy[-sp::zerodist(stand.xy)[,1],]
  grd <- expand.grid(Xfield3 = seq(0, 105, length.out = 106), Yfield3 = seq(0, 105, length.out = 106)) %>% 
    st_as_sf(coords = c("Xfield3", "Yfield3")) %>% 
    as_Spatial()
  stand.vgm <- variogram(NCI ~ 1, stand.xy)
  stand.fit <- fit.variogram(stand.vgm, model = vgm("Sph"))
  stand.kriged <- krige((NCI) ~ 1, stand.xy, grd, model = stand.fit)
  stand.kriged %>% 
    as.data.frame %>% 
    rename(Xfield = coords.x1, Yfield = coords.x2, NCI = var1.pred)
}
stands <- vroom::vroom("save/nci.tsv") %>% 
  mutate(stand = paste0("P", Plot, "C", SubPlot)) %>% 
  dplyr::select(stand) %>% 
  collect() %>% 
  unique() %>% 
  unlist()
names(stands) <- stands
lapply(stands, krige_stand) %>% 
  bind_rows(.id = "stand") %>% 
  rename(Xfield3 = Xfield, Yfield3 = Yfield, NCI.var = var1.var) %>% 
  write_tsv("data/ncikriged.tsv")
rotateTab <- function(tab){
  tab0 <- mutate(tab, stand = paste0(stand, "-0"))
  tabX <- mutate(tab, stand = paste0(stand, "-flipX")) %>% 
     mutate(Xfield3 = abs(105-Xfield3))
  tabY <- mutate(tab, stand = paste0(stand, "-flipY")) %>% 
    mutate(Yfield3 = abs(105-Yfield3))
  tabXY <- mutate(tab, stand = paste0(stand, "-flipXY")) %>% 
    mutate(Yfield3 = abs(Yfield3-105), Xfield3 = abs(Xfield3-105))
  return(bind_rows(tab0, tabX, tabY, tabXY))
}
rotateTab(vroom::vroom("data/ncikriged.tsv")) %>% 
  write_tsv("data/ncikrigedrotated.tsv")
```

```{r plotnci, eval=F}
library(gganimate)
g <- vroom::vroom("data/ncikrigedrotated.tsv") %>% 
  ggplot(aes(Xfield3, Yfield3, fill = NCI)) + 
  geom_raster() +
  coord_equal() +
  viridis::scale_fill_viridis(direction = -1) +
  transition_manual(stand) +
  labs(title = 'Stand: {current_frame}')
anim_save("images/nci.gif", g)
```

```{r plotncigif, fig.cap="NCI in 2015 across SubPlots."}
include_graphics("images/nci.gif")
```

```{r EnvmdataNull, eval=F}
path <- "data/Symphonia_Paracou/Sequences/populationGenomics/"
paracou <- read_tsv("save/paracou.tsv") %>% 
  mutate(Ind = paste0("P", Plot, "-", SubPlot, "-", TreeFieldNum)) %>% 
  mutate(stand = paste0("P", Plot, "C", SubPlot, "-0")) %>% 
  left_join(data.frame(SubPlot = 1:4, Xcor2 = c(20, 0, 20, 0), Ycor2 = c(0, 0, 20, 20))) %>% 
  mutate(Xfield3 = Xfield2 - Xcor2, Yfield3 = Yfield2 - Ycor2) %>% 
  dplyr::select(stand, Ind, Xfield3, Yfield3)
trees <- read_tsv(file = "save/NCI.tsv") %>% 
  dplyr::select(-NCI) %>% 
  left_join(paracou) %>% 
  filter(Xfield3 > 0, Yfield3 > 0, Xfield3 < 105, Yfield3 < 150) %>% 
  mutate(Xfield3 = round(Xfield3), Yfield3 = round(Yfield3))
nci <- vroom::vroom("data/ncikrigedrotated.tsv")   
ids <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king.id"))
K <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king"),
         col_names = ids$IID) %>% 
  as.data.frame()
row.names(K) <- ids$IID
K <- as.matrix(K)
trees <- trees %>% 
  filter(IID %in% colnames(K)) %>% 
  mutate(IndNum = 1:nrow(.), popNum = as.numeric(as.factor(pop)))
N <- 100
mdata <- lapply(0:N, function(sim) {
  if(sim == 0) {
    dat <- left_join(trees, nci, 
                     by = c("Xfield3", "Yfield3"), 
                     suffix = c(".true", ".null")) %>% 
      filter(stand.true == stand.null)
  } else {
    dat <- left_join(trees, nci, 
                     by = c("Xfield3", "Yfield3"), 
                     suffix = c(".true", ".null")) %>% 
      filter(stand.true != stand.null) %>% 
      group_by(stand.true) %>% 
      filter(stand.null == sample(unique(stand.null), 1))
  }
  K <- K[dat$IID, dat$IID]
  K[K < 0] <- 0
  K <- K*2
  K <- as.matrix(Matrix::nearPD(K)$mat)
  return(list(N = nrow(dat),
              P = length(unique(dat$pop)),
              y = as.vector(scale(dat$NCI, center = F)),
              population = dat$popNum,
              K = K))
})
names(mdata) <- 0:N
# str(mdata)
save(mdata, file = file.path("save", 'dataEnvNull.Rdata'))   
```

```{bash envGenoNullCluster, eval=F}
for sim in $(seq 0 100) ; do echo "module purge ; module load compiler/gcc-7.2.0 ; module load system/R-3.5.3 ; R_LIBS_USER=\" \" Rscript EnvGenoNull.R $sim" ; done > EnvGenoNull.sh
sarray -J EnvGenoNull -o out/%j.EnvGenoNull.out -e out/%j.EnvGenoNull.err -t 48:00:00 --constraint=broadwell --cpus-per-task=1 --mail-type=BEGIN,END,FAIL EnvGenoNull.sh
watch 'squeue -u sschmitt | wc -l'
watch 'tail -n 1 out/*.out | grep Chain'
```

```{r envGenoNullGraph, fig.cap="Among-genotype variance (VG) between observed value (red) and values observed in null models (black) randomizing neighbourhood crowding index (NCI) among plots."}
fitEnvNull <- list()
sims <- gsub(".Rdata", "", list.files("save/EnvGenoNull"))
for(sim in sims){
  load(paste0("save/EnvGenoNull/", sim, ".Rdata"))
  fitEnvNull <- c(fitEnvNull, fit)
}
names(fitEnvNull) <- sims
vgs <- lapply(fitEnvNull, function(fit) try(broom.mixed::tidyMCMC(fit, c("Vg"), droppars = NULL, rhat = T)))
vgs <- bind_rows(vgs[lapply(vgs, class) != "try-error"], .id = "simulation")
ggplot(vgs, aes(estimate)) +
  geom_histogram(fill = "lightgrey", binwidth = 0.001) +
  geom_density(aes(y = 0.001*..count..), fill = NA) +
  geom_vline(xintercept = 0.021, col = "red") +
  geom_text(label = "P < 0.01", y = 20, data = data.frame(estimate = 0.01)) +
  xlab(expression(V[G]))
```

## Closely-related species

If the simultaneous inclusion of the three closely related species bias the analysis, we should not observe similar patterns inferring the analyses for each species separately. Instead, we observed genotypic variation to be related to the environment for all species together or independently ($\frac{\sigma^2_G}{\sigma^2_P} \in [0.15-0.76]$, Fig. S\@ref(fig:envGenoSpeciesVarPart)).


```{r EnvmSpdata, eval=F}
trees <- read_tsv(file = "save/NCI.tsv")
species <- unique(trees$pop)
ids <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king.id"))
K <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king"),
         col_names = ids$IID) %>% 
  as.data.frame()
row.names(K) <- ids$IID
K <- as.matrix(K)
trees <- trees %>% 
  filter(IID %in% colnames(K))
mdata <- lapply(species, function(sp){
  trees_sp <- filter(trees, pop == sp) %>% 
    mutate(IndNum = 1:nrow(.))
  K_sp <- K[trees_sp$IID, trees_sp$IID]
  K_sp[K_sp < 0] <- 0
  K_sp <- K_sp*2
  K_sp <- as.matrix(Matrix::nearPD(K_sp)$mat)
  return(list(N = nrow(trees_sp),
              P = 1,
              y = as.vector(scale(trees_sp$NCI, center = F)),
              population = rep(1, nrow(trees_sp)),
              K = K_sp))})
names(mdata) <- species
save(mdata, file = file.path("save", 'dataEnvSp.Rdata'))   
```

```{r envGenoSpFit, eval=F}
load(file.path("save", 'dataEnvSp.Rdata'))   
model <- stan_model("models/AnimalLog.stan")
fit <- lapply(mdata, function(sp) 
  sampling(model, chains = 4, data = sp, save_warmup = F,
           control = list(adapt_delta = 0.99, max_treedepth = 12)))
names(fit) <- names(mdata)
save(fit, file = file.path("save", 'fitEnvSp.Rdata'))   
```

```{r envGenoSpeciesTab, fig.cap="Genetic variance of individual growth potential with a lognormal animal model."}
load(file.path("save", 'fitEnvSp.Rdata'))   
lapply(fit, broom.mixed::tidyMCMC, c("mu", "Vp", "Vg", "Vr"), 
       droppars = NULL, rhat = T) %>% 
  bind_rows(.id = "Species") %>% 
  mutate(Species = recode(Species, "sp1" = "S. sp.1",
                          "globuliferaTypeParacou" = "S. sp.2" ,
                           "globuliferaTypeRegina" = "S. sp.3" )) %>%
  separate(term, c("parameter", "population"), convert = T) %>% 
  dplyr::select(-population) %>% 
  kable(caption = "Summary table of the kinship growth model",
        col.names = c("Variable", "Parameter",
                      "Estimate", "$\\sigma$", "$\\hat{R}$"))   
```

```{r envGenoSpeciesTrace, fig.cap="Traceplot for environmental variables."}
cowplot::plot_grid(plotlist = lapply(fit, mcmc_trace, 
                                     regex_pars = c("Vg", "Vr"), 
                                     facet_args = list(nrow = 3)),
                   nrow = 1, labels = c("S. sp.1",  "S. sp.2" , "S. sp.3"))

```

```{r envgenoSpeciesR2, fig.cap="R2 for environmental variable"}
lapply(fit, pars = c("Vg", "Vr"), as.data.frame) %>% 
  bind_rows(.id = "model") %>% 
  rowwise() %>% 
  mutate(Vtot = sum(c(Vg, Vr))) %>% 
  mutate(Vexp = sum(c(Vg))) %>% 
  mutate_at(c("Vg", "Vexp"), funs(./Vtot)) %>% 
  dplyr::select(-Vtot, -Vr) %>% 
  reshape2::melt(id.vars = "model") %>% 
  group_by(model, variable) %>% 
  summarise(q5 = quantile(value, 0.05),
            q25 = quantile(value, 0.25),
            mean = mean(value),
            median = median(value),
            sd = sd(value),
            q75 = quantile(value, 0.75),
            q95 = quantile(value, 0.95)) %>% 
    mutate(variable = recode_factor(variable, 
                           "Vexp" = "Marginal", "Vg" = "Genotype")) %>% 
    mutate(model = recode(model, "sp1" = "S. sp.1",
                           "globuliferaTypeParacou" = "S. sp.2" ,
                           "globuliferaTypeRegina" = "S. sp.3" )) %>%
  ggplot(aes(x = variable, xend = variable, col = variable)) +
  geom_point(aes(y = median), shape = 21, size = 3, alpha = 0.5) +
  geom_segment(aes(y = q5, yend = q95),
               size = 1, show.legend = F, alpha = 0.5) +
  geom_segment(aes(y = q25, yend = q75), size = 2, alpha = 0.5) +
  ylab(expression(R^2)) +  
  theme(axis.title.y = element_blank()) +
  facet_wrap(~ model, nrow = 3) +
  coord_flip()
```

```{r envGenoSpeciesVarPart, fig.cap="Variance partitioning for neighbourhood crowding index (NCI), an indirect measure of access to light, for each species. Variation of each variable has been partitioned into among-genotype (red), and residual (blue) variations."}
lapply(fit, mcmc_intervals_data, regex_pars = c("Vg", "Vr")) %>% 
  bind_rows(.id = "variable") %>% 
  mutate(parameter = recode(parameter, "Vg" = "Genotype", "Vr" = "Residual")) %>% 
  group_by(variable) %>%
  mutate(pct = paste0(round(m / sum(m) * 100), "%")) %>%
  mutate(variable = recode(variable, "sp1" = "S. sp.1",
                           "globuliferaTypeParacou" = "S. sp.2" ,
                           "globuliferaTypeRegina" = "S. sp.3" )) %>%
  ggplot(aes(x = variable, fill = parameter)) +
  geom_col(aes(y = m)) +
  geom_text(aes(y = m, label = pct), col = "white", position = position_stack(vjust = .5)) +
  facet_wrap(~ variable, scales = "free") +
  theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
        axis.text.x = element_blank(), axis.line.x = element_blank(), axis.ticks.x = element_blank()) +
  scale_fill_discrete(expression(sigma^2))
```

<!-- ## Isolation by environement (IBE) -->

<!-- > Je pense qu'il faut trouver une manière de présenter la méthode et les résultats plus facile à suivre, notamment pour ceux qui n'ont pas un bon background en génétique quanti moderne. Personnellement je serais aussi un peu plus prudent dans les interprétations car il y a peut-être d'autres mécanismes qui pourraient conduire aux patterns observés. Je pense notamment à de la dispersion de graines qui se feraient préférentiellement entre microhabitats similaires (peut-être à vérifier en comparant l'IBD à l'IBE). A moins que nos discussions finissent par me convaincre tout à fait, j'ai l'impression que tu as des éléments probants pour échafauder des hypothèses mais pas de preuves formelles. C'est là que proposer des expérimentations pour tester ces hypothèses serait peut être une plus-value. *O. Hardy* -->

<!-- ```{r} -->
<!-- paracou <- src_sqlite(file.path("data", "Paracou","trees", "Paracou.sqlite")) %>%  -->
<!--   tbl("Paracou") %>%  -->
<!--   filter(Genus == "Symphonia") %>%  -->
<!--   filter(CensusYear == 2015) %>%  -->
<!--   collect() %>%  -->
<!--   mutate(Ind = paste0("P", Plot, "-", SubPlot, "-", TreeFieldNum)) %>%  -->
<!--   dplyr::select(Ind, Xutm, Yutm) -->
<!-- trees <- read_tsv(file = "save/NCI.tsv") %>%  -->
<!--   left_join(paracou) -->
<!-- ids <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king.id")) -->
<!-- K <- read_tsv(file.path(path, "..", "variantCalling", "growth", "plink2.king"), -->
<!--          col_names = ids$IID) %>%  -->
<!--   as.data.frame() -->
<!-- row.names(K) <- ids$IID -->
<!-- K <- as.matrix(K) -->
<!-- trees <- trees %>%  -->
<!--   filter(IID %in% colnames(K)) %>%  -->
<!--   mutate(IndNum = 1:nrow(.), popNum = as.numeric(as.factor(pop))) -->
<!-- # all(trees$IID == colnames(K)) -->
<!-- G <- K -->
<!-- D <- dist(trees[c("Xutm", "Yutm")]) %>%  -->
<!--   as.matrix() -->
<!-- E <- dist(trees[c("NCI")]) %>%  -->
<!--   as.matrix() -->
<!-- ib <- read_tsv("../PhD/data/Symphonia_Paracou/Sequences/variantCalling/spagedi/globuliferaTypeParacou.1k.spagedi.out.ibd") %>%  -->
<!--   rename(IID1 = "Name i", IID2 = "Name j", distance = "Spatial dist", kinship = "ALL LOCI", moran = "ALL LOCI_1") %>%  -->
<!--   mutate(distance = ifelse(distance == "inbreeding coef", 0, distance)) %>%  -->
<!--   mutate(distance = as.numeric(distance)) %>%  -->
<!--   filter(distance < 10^5) %>%  -->
<!--   dplyr::select(IID1, IID2, distance, kinship) %>%  -->
<!--   left_join(reshape2::melt(E) %>%  -->
<!--               rename(IID1 = Var1, IID2 = Var2, nci = value)) -->
<!-- ib %>%  -->
<!--   reshape2::melt(id.vars = c("IID1", "IID2", "kinship")) %>%  -->
<!--   ggplot(aes(value, kinship)) + -->
<!--   geom_point() + -->
<!--   geom_smooth(method = "lm") + -->
<!--   facet_wrap(~ variable, scales = "free", nrow = 2) -->
<!-- ib.full <- bind_rows(ib, -->
<!--                      ib %>%  -->
<!--                        rename(IID3 = IID1) %>%  -->
<!--                        rename(IID1 = IID2, IID2 = IID3)) %>%  -->
<!--   unique() -->

<!-- vegan::mantel(as.matrix(reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "kinship")[-1]), -->
<!--               as.matrix(reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "distance")[-1])) -->

<!-- reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "distance") %>%  -->
<!--   filter_if(is.na) -->
<!-- as.matrix(reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "kinship")) -->

<!-- ib.full %>%  -->
<!--   filter() -->
<!-- mantel.rtest(as.matrix(reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "kinship")[-1]), -->
<!--               as.matrix(reshape2::dcast(ib.full, IID1 ~ IID2, value.var = "distance")[-1])) -->

<!-- data.frame(G = as.vector(G), D = as.vector(D), E = as.vector(E)) %>%  -->
<!--   cor() %>%  -->
<!--   corrplot::corrplot.mixed() -->
<!-- ``` -->