Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model decisions #125

Merged
merged 9 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions vignettes/DecisionTree.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
---
title: "How to choose how to model"
output:
bookdown::html_document2:
base_format: rmarkdown::html_vignette
fig_caption: yes
bibliography: "covid.bib"
link-citations: yes
vignette: >
%\VignetteIndexEntry{Priors exploration}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

4d-Modeller and INLA/inlabru allow you to use both fixed and random effects in your model with SPDEs allowing you to model continuous random processes in your data. Below we explain briefly each of these processes and how to choose when to use them.

In this document we will use the following data for examples:

```{r, error=TRUE}
library(INLA) # for modeling later
library(MASS) # for true multivariate normal simulation
set.seed(42)

n <- 100 # number of observations
timesteps <- 10 # number of time steps

# Create a basic time and space grid
df <- expand.grid(x = seq(1, 10, length.out = sqrt(n)),
y = seq(1, 10, length.out = sqrt(n)),
time = 1:timesteps)

# Create a covariance matrix
cov_matrix <- matrix(0.5, nrow = n, ncol = n) # Example covariance matrix with 0.5 correlation
diag(cov_matrix) <- 1 # Diagonal should be 1

# Generate spatially and temporally correlated noise for each time step
for (i in unique(df$time)) {
noise <- mvrnorm(n = 1, mu = rep(0, n), Sigma = cov_matrix)
df$value[df$time == i] <- noise
}

head(df)

library(ggplot2)

ggplot(df, aes(x = x, y = y, color = value)) +
geom_point() +
facet_wrap(~ time) +
scale_color_viridis_c() +
theme_minimal()


```

## Fixed Effects

A fixed effect in a bayesian model is an effect whose value is fixed in a location but has a probability distribution associated with it's uncertainty. A fixed effect represents a parameter or coefficient in a model that is assumed to have a fixed, constant value across different observations/locations. In a Bayesian model, a fixed effect is a special case of a random effect where we assume the effect has zero variance.

### why would you choose a fixed effect?

You would choose to model an effect as a fixed effect if you believe the effect will always have the same outcome no matter the context of that the observation is taken. For example, a unit of rain should fill a reservoir by that an equivalent unit no matter the conditions given constant climatic conditions with non-porous soil.

### example of a fixed effect in r-inla/inlabru

A fixed effect is represented in the fit equation as the variable name. In the simulated data example, there are three columns `x`, `y`, and `time`. You can see below how these are written into the formula that is then passed to inla.

```{r, error=TRUE}
formula <- value ~ 1 + x + y + time
result <- inla(formula, data = df, family = "gaussian")
summary(result)
```

## Random Effects

A random effect in a bayesian model has an effect whose value is drawn from a probability distribution and has associated uncertainty drawn from it's own distribution. There are two types of random effects that can be modeled in inla, random effects of unobserved processes and random effects constrained by observed processes.

### Random Effects of Unobserved Processes

Unobserved processes are processes that may be spatially or temporally distributed that do not have data or a theoretical description as to what they might be. Instead, it is considered variance found within the data that is not representative of randomness and instead some kind of unobserved process.

For example, patients being infected by COVID-19 were impacted by social and demographic factors that helped to control infection rates [link](link_to_covid_tutorial). But what if in addition to social and demographic factors, there was an unobserved migration of people into a neighborhood thereby increasing the total number of potential people who could be infected? This would create unobserved variance in the data set (assuming we do not have neighborhood population counts).

Below we show a model that defines a formula where the `f` subroutine represents the unobserved variance occuring through time while `x` and `y` are fixed effects.gi

```{r, error=TRUE}
# Define a formula for the model
# Load the INLA library

# Define the formula for the INLA model
formula <- value ~ 1 + x + y + f(time, model = "rw1")

# Fit the INLA model
result <- inla(formula, data = df, family = "gaussian")

# Print the summary of the INLA model
summary(result)
```


### Random Effects Constrained by Observed Processes

Observed data can be used to constrain effects such that inla/inlabru can be used to do data driven inversion, i.e., estimating processes that cannot be observed directly. For example,

<!-- TODO: come up with example -->

```{r, error=TRUE}
library(inlabru)

# Fit a simple model
fit <- bru(value ~ 1 + observed(x + y, model = "linear"), data = df)

# Print the model summary
print(summary(fit))
```

### Comparing the output

Finally, we can compare the output of all the models using the DIC. This is important as we would like to know the fittedness of each model even if we have some theoretical belief that a model should be specificied in a certain way.

```{r error=TRUE}
plot(DICs)
```
72 changes: 72 additions & 0 deletions vignettes/inlaFAQ.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: "INLA FAQ"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{inlaFAQ}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

There are number of issues that can crop up when running/fitting the model either using INLA or inlabru. Below we have compiled issues that we find common but we also recommend asking questions on the [google groups forum](https://groups.google.com/g/r-inla-discussion-group).

## Has the model actually crashed?

Does the model stop running and give error like `inla.inlaprogram.has.crash(): the inla-program exited with an error. …..`? Or does it give other messages but has not actually crashed?

INLA models can take some time to produce solutions and often times will give messages like "failed to factorize Q". However this does not mean the model has completely failed, only that in that iteration it did not complete it's task. Allow the model to run as long as it needs or returns a crash message.

## Are you running the model in verbose mode?

The model can be run in verbose mode so that it outputs feedback. This feedback can be useful for diagnosing issues. Running in verbose mode is done like so:

```{rerror=TRUE,message=FALSE,warning=FALSE}
fit.mod<-bru(formula, data =XX,
family = XX,
options = list(
inla.mode="classic", # or “inla.mode="experimental"
verbose = TRUE) )
```

## Memory issues
- How large is your data set?
- How many mesh nodes? Is the mesh well designed (e.g., no isolated triangles)?
- Have you tried reducing the number of mesh nodes and testing which mesh size avoids errors?


## `inla.mode="classic"` or `inla.mode="experimental"`

`inlabru` has two modes "classic" and "experimental". It is often the case that if you switch the mode then the model will not crash and you will get a result. To do so, add the parameter either `inla.mode="classic"` or `inla.mode="experimental"` to the INLAbru call function. So, it’s something like:

```{rerror=TRUE,message=FALSE,warning=FALSE}
fit.mod<-bru(formula, data =XX,
family = XX,
options = list(
inla.mode="classic", # or “inla.mode="experimental"
verbose = TRUE) )
```

## How many threads are you using?

By default, `inlabru` will use however many threads are available. However this can cause issues. You can limit the number of threads with `num.threads`, by setting `num.threads="1:1"`:

```{rerror=TRUE,message=FALSE,warning=FALSE}
fit.mod<-bru(formula, data =XX,
family = XX,
options = list(
num.threads="1:1",
verbose = TRUE)
)
```

## Have you selected reasonable control parameters?

INLA has a number options that can be accessed in the fitting formula using the `options` parameter. [The list of options is rather large and we encourage you to explore and experiment with them](https://www.rdocumentation.org/packages/inlabru/versions/2.9.0/topics/bru_options). Typically this looks something like:

```{rerror=TRUE,message=FALSE,warning=FALSE}
fit.mod<-bru(formula, data =XX, family = XX,
options = list(
control.inla = list(
reordering = "metis",
int.strategy = "eb"),
verbose = TRUE ) )
```
Loading