Skip to content

Commit

Permalink
Update build model (beta):
Browse files Browse the repository at this point in the history
- Reworked description
- Hide results for ordistep
  • Loading branch information
rachelleesk committed Dec 5, 2022
1 parent 9c92d8f commit c516221
Showing 1 changed file with 17 additions and 14 deletions.
31 changes: 17 additions & 14 deletions vignettes/build-models.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -205,15 +205,17 @@ rm(birds, birds_scaled, predictors, subset_exp,

## 2. Community (_Beta_) diversity

Partial RDA models were used to model the relationship between animal communities and landscape predictors while considering also the inherent spatial structure of the animal communities. Similarly, variable selection by random forest was performed on the landscape variables prior to beta model building. In the following sections, the example variables were selected using the `randomForestSRC` and `MultivariateRandomForest` packages. After the first round of variable selection, the landscape predictors were summarised as Principal Component Analysis (PCA) axes while the spatial variation were quantified using Principle Coordinates of Neighbourhood Matrices (PCNM).
Variable selection by random forest was first performed on the landscape variables prior to beta model building, using the `randomForestSRC` and `MultivariateRandomForest` packages.

Partial RDA models were used to model the relationship between animal communities and landscape predictors while considering also the inherent spatial structure of the animal communities. The landscape predictors were summarised as Principal Component Analysis (PCA) axes while the spatial variation were quantified using Principle Coordinates of Neighbourhood Matrices (PCNM).

First, load the additional package `vegan` for beta diversity model building:

```{r load vegan, message = FALSE, warning = FALSE}
library("vegan")
```

As before, bird species density and landscape data for sampling points across six areas (towns) in Singapore will be analysed in this example. In this example dataset `beta_model_data`, animal and landscape datasets as processed in the previous two articles were combined. The variable selection output using random forest is also loaded from `beta_model_variables` to inform the principal component analysis (PCA) of the selected landscape variables later.
As before, bird species density and landscape data for sampling points across six areas (towns) in Singapore will be analysed in this example. The example dataset, `beta_model_data`, contains both animal and landscape datasets processed in the previous two articles. The variable selection output using random forest is also loaded from `beta_model_variables`.

```{r load data, message = FALSE, warning = FALSE}
Expand Down Expand Up @@ -247,7 +249,7 @@ animal_observations <- animal_observations %>%
mutate(period = as.character(period))
```

An presence/absence community matrix is created from the observations to summarise total species incidence.
Create a presence/absence community matrix is created from the observations to summarise total species incidence. Use the `birds` data from `beta_model_data` to ensure the same sampling point, sampling period, and sampling round are used for analysis.

```{r creating a presence/absence matrix, message = FALSE, warning = FALSE}
bird_com <- animal_observations %>%
Expand All @@ -270,7 +272,7 @@ bird_com <- bird_com %>%
rm(rmspp)
```

After the `birds` dataset has been used to filter `bird_com`, coordinate information were added for PCNM analysis later.
Add the coordinate information for PCNM analysis.

```{r appending spatial information to landscape dataset, message = FALSE, warning = FALSE}
beta_model_data <- beta_model_data %>%
Expand All @@ -284,10 +286,9 @@ beta_model_data <- beta_model_data %>%
.after = X)
```

Using the variables selected by random forest, principal component analysis (PCA) is performed to summarise the landscape predictors. Only PCA axes with eigenvalues greater than one were retained. This step selects for PCA axes that account for more variance than the original variables on their own.
Using the variables selected by random forest, we can now perform the principal component analysis (PCA) to summarise the landscape predictors. Retain only PCA axes with eigenvalues greater than one to select for PCA axes that account for more variance than the original variables on their own.

```{r diagnosis by PCA}
pca_birds <- beta_model_data %>%
dplyr::select(any_of(beta_model_variables$var)) %>% #select impt variables from RF
prcomp(.,scale.=TRUE) #conduct pca
Expand All @@ -299,7 +300,7 @@ pca_selected_birds <- pca_birds$x[, which(pca_birds$sdev > 1)]

Calculate the PCNM vectors using `Vegan::pcnm` and fit the animal community matrix against these PCNM vectors using `Vegan::rda`. To control for type I error while testing for significant predictors, significant PCNM vectors were first selected using a backwards, stepwise elimination method until all vectors has significance level smaller than the chosen significance level of 0.05 (Peres-Neto & Legendre, 2010). This was performed using `Vegan::ordistep`.

```{r PCNM, message = FALSE, warning = FALSE}
```{r PCNM, message = FALSE, warning = FALSE, echo = TRUE, results = "hide"}
plots.xy <- cbind(beta_model_data$X, beta_model_data$Y)
raw_pcnm_birds <- pcnm(dist(plots.xy))
Expand All @@ -312,9 +313,9 @@ birds.pcnm.sub <- ordistep(birds.pcnm.all, direction = "backward", pout = 0.075)
```

Perform partial RDA was performed with the selected PCA axes as candidate explanatory variables and the significant PCNM vectors as covariates. Perform `Vegan::ordistep` to select for the significant PCA axes while accounting for the effects of the covariables in the model until all PCA axes has significance level smaller than the chosen significance level of 0.05. The final RDA model is fitted with the significant PC axes and PCNM vectors.
Perform partial RDA with the selected PCA axes as candidate explanatory variables and the significant PCNM vectors as covariates. Perform `Vegan::ordistep` to select for the significant PCA axes while accounting for the effects of the covariables in the model until all PCA axes has significance level smaller than the chosen significance level of 0.05.

```{r stepwise selection with env vars, message = FALSE, warning = FALSE}
```{r stepwise selection with env vars, message = FALSE, warning = FALSE, echo = TRUE, results = "hide"}
birds_selected <- cbind(pca_selected_birds, pcnm_scores_birds)
birds.rda.all <- update(birds.pcnm.sub,
Expand All @@ -323,21 +324,23 @@ birds.rda.all <- update(birds.pcnm.sub,
birds.rda.sub <- ordistep(birds.rda.all, direction = "backward") #assess p-value for dropping
```

The final RDA model is fitted with the significant PC axes and PCNM vectors.

```{r examine rda results and fit final model, message = FALSE, warning = FALSE}
birds.rda.sub$call
birds.rda.fin <- rda(bird_com ~ PC1 + PC2 + PC4 + PC6 + PC7 + PC8 + Condition(PCNM1 + PCNM2 + PCNM3 + PCNM4 + PCNM5), data = birds_selected)
```

The predictive _Beta_-diversity model also requires information on the average probability of each species occurring across all points.
Caclulate the average probability of each species occurring across all points. This is a required model input for the predictive _Beta_-diversity model.

```{r p-hats}
# fitted_126 <- fitted(birds1.rda.fin, model = "CCA")
p_hats_birds <- apply(bird_com, 2, mean)
p_hats_birds
```

The PCA axes, PCNM vectors, final RDA model and probability of species occurrence are saved out to be applied in the subsequent apply model section.
Finally, save out all relevant model inputs to be applied in the subsequent apply model section.

```{r save beta model objs, eval = FALSE}
save(pca_birds,
Expand Down

0 comments on commit c516221

Please sign in to comment.