correct signatures in function references and add section on embedded Laplace in users guide on GPs.

charlesm93 · charlesm93 · commit 99661efba0cf · 2025-06-10T17:55:59.000-04:00
diff --git a/src/bibtex/all.bib b/src/bibtex/all.bib
@@ -1894,4 +1894,50 @@ @misc{seyboldt:2024
   note="pyro-ppl GitHub repository issue \#1751",
   year = "2024",
   url ="https://github.com/pyro-ppl/numpyro/pull/1751#issuecomment-1980569811"
-}
+}
+
+@article{Margossian:2020,
+               Author = {Margossian, C. C. and Vehtari, A. and Simpson, D.
+                               and Agrawal, R.},
+               Title = {Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond},
+               journal = {Advances in Neural Information Processing Systems},
+               volume = {34},
+               Year = {2020}}
+
+@article{Kuss:2005,
+  author = {Kuss, Malte and Rasmussen, Carl E},
+  title = {Assessing Approximate Inference for Binary {Gaussian} Process Classification},
+  journal = {Journal of Machine Learning Research},
+  volume = {6},
+  pages = {1679 -- 1704},
+  year = {2005}}
+
+@article{Vanhatalo:2010,
+  author        = {Jarno Vanhatalo and Ville Pietil\"{a}inen and Aki Vehtari},
+  title         = {Approximate inference for disease mapping with sparse {Gaussian} processes},
+  journal       = {Statistics in Medicine},
+  year          = {2010},
+  volume        = {29},
+  number        = {15},
+  pages         = {1580--1607}
+}
+
+@article{Cseke:2011,
+  author = {Botond Cseke and  Heskes, Tom},
+  title = {Approximate marginals in latent {Gaussian} models},
+  journal = {Journal of Machine Learning Research},
+  volume = {12},
+  issue = {2},
+  page = {417 -- 454},
+  year = {2011}}
+
+@article{Vehtari:2016,
+  author  = {Aki Vehtari and Tommi Mononen and Ville Tolvanen and Tuomas Sivula and Ole Winther},
+  title   = {Bayesian Leave-One-Out Cross-Validation Approximations for {Gaussian} Latent Variable Models},
+  journal = {Journal of Machine Learning Research},
+  year    = {2016},
+  volume  = {17},
+  number  = {103},
+  pages   = {1--38},
+  url     = {http://jmlr.org/papers/v17/14-540.html}
+}
diff --git a/src/functions-reference/embedded_laplace.qmd b/src/functions-reference/embedded_laplace.qmd
@@ -370,21 +370,20 @@ $p(y \mid \theta)$ is a Poisson distribution with a log link
 and allows the user to tune the control parameters of the approximation.
 {{< since 2.37 >}}
 
-A similar built-in likelihood lets users specify an offset $x_i \in \mathbb R^+$
-to the rate parameter of the Poisson. The likelihood is then,
+A similar built-in likelihood lets users specify a vector offset
+$x \in \mathbb R^N$ with $x_i \ge 0$ to the rate parameter of the Poisson.
+The likelihood is then,
 $$
 p(y \mid \theta, \phi) = \prod_i\text{Poisson} (y_i \mid \exp(\theta_{g(i)}) x_i).
 $$
 
-<!-- real; laplace_marginal_poisson_2_log ~; -->
 \index{{\tt \bfseries laplace\_marginal\_poisson\_2\_log }!sampling statement|hyperpage}
 
 `y ~ ` **`laplace_marginal_poisson_2_log`**`(y_index, x, theta_init, covariance_function, (...))`<br>\newline
 
 Increment target log probability density with `laplace_marginal_poisson_2_log_lupmf(y | y_index, x, theta_init, covariance_function, (...))`.
 {{< since 2.37 >}}
 
-<!-- real; laplace_marginal_tol_poisson_2_log ~; -->
 \index{{\tt \bfseries laplace\_marginal\_tol\_poisson\_2\_log }!sampling statement|hyperpage}
 
 `y ~ ` **`laplace_marginal_tol_poisson_2_log`**`(y_index, x, theta_init, covariance_function, (...), tol, max_steps, hessian_block_size, solver, max_steps_linesearch)`<br>\newline
@@ -393,16 +392,14 @@ Increment target log probability density with `laplace_marginal_tol_poisson_2_lo
 
 The signatures for this function are:
 
-<!-- real; laplace_marginal_poisson_2_log_lpmf; (array[] int y | array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...)); -->
-\index{{\tt \bfseries laplace\_marginal\_poisson\_2\_log\_lpmf }!{\tt (array[] int y \textbar\ array[] int y\_index, vector theta\_init, function covariance\_function, tuple(...)): real}|hyperpage}
-`real` **`laplace_marginal_poisson_2_log_lpmf`**`(array[] int y | array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...))`<br>\newline
+\index{{\tt \bfseries laplace\_marginal\_poisson\_2\_log\_lpmf }!{\tt (array[] int y \textbar\ array[] int y\_index, vector x, vector theta\_init, function covariance\_function, tuple(...)): real}|hyperpage}
+`real` **`laplace_marginal_poisson_2_log_lpmf`**`(array[] int y | array[] int y_index, vector x, vector x, vector theta_init, function covariance_function, tuple(...))`<br>\newline
 Returns an approximation to the log marginal likelihood $p(y \mid \phi)$
 in the special case where the likelihood $p(y \mid \theta)$ is a Poisson
 distribution with a log link and an offset.
 {{< since 2.37 >}}
 
-<!-- real; laplace_marginal_tol_poisson_2_log_lpmf; (array[] int y | array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...), real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch); -->
-\index{{\tt \bfseries laplace\_marginal\_tol\_poisson\_2\_log\_lpmf }!{\tt (array[] int y \textbar\ array[] int y\_index, vector theta\_init, function covariance\_function, tuple(...), real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage}
+\index{{\tt \bfseries laplace\_marginal\_tol\_poisson\_2\_log\_lpmf }!{\tt (array[] int y \textbar\ array[] int y\_index, vector x, vector theta\_init, function covariance\_function, tuple(...), real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage}
 
 `real` **`laplace_marginal_tol_poisson_2_log_lpmf`**`(array[] int y | array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...), real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch)`<br>\newline
 
@@ -412,17 +409,15 @@ distribution with a log link and an offset
 and allows the user to tune the control parameters of the approximation.
 {{< since 2.37 >}}
 
-<!-- real; laplace_marginal_poisson_2_log_lupmf; (array[] int y | array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...)); -->
-\index{{\tt \bfseries laplace\_marginal\_poisson\_2\_log\_lpmf }!{\tt (array[] int y \textbar\ array[] int y\_index, vector theta\_init, function covariance\_function, tuple(...), real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage}
+\index{{\tt \bfseries laplace\_marginal\_poisson\_2\_log\_lpmf }!{\tt (array[] int y \textbar\ array[] int y\_index, vector x, vector theta\_init, function covariance\_function, tuple(...), real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage}
 `real` **`laplace_marginal_poisson_2_log_lpmf`**`(array[] int y | array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...), real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch)`<br>\newline
 
 Returns an approximation to the log marginal likelihood $p(y \mid \phi)$
 in the special case where the likelihood $p(y \mid \theta)$ is a Poisson
 distribution with a log link and an offset.
 {{< since 2.37 >}}
 
-<!-- real; laplace_marginal_tol_poisson_2_log_lupmf; (array[] int y | array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...), real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch); -->
-\index{{\tt \bfseries laplace\_marginal\_tol\_poisson\_2\_log\_lupmf }!{\tt (array[] int y \textbar\ array[] int y\_index, vector theta\_init, function covariance\_function, tuple(...)): real}|hyperpage}
+\index{{\tt \bfseries laplace\_marginal\_tol\_poisson\_2\_log\_lupmf }!{\tt (array[] int y \textbar\ array[] int y\_index, vector x, vector theta\_init, function covariance\_function, tuple(...)): real}|hyperpage}
 
 `real` **`laplace_marginal_tol_poisson_2_log_lupmf`**`(array[] int y | array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...))`<br>\newline
 
@@ -432,8 +427,7 @@ distribution with a log link and an offset
 and allows the user to tune the control parameters of the approximation.
 {{< since 2.37 >}}
 
-<!-- vector; laplace_latent_poisson_2_log_rng; (array[] int y, array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...)); -->
-\index{{\tt \bfseries laplace\_latent\_poisson\_2\_log\_rng }!{\tt (array[] int y, array[] int y\_index, vector theta\_init, function covariance\_function, tuple(...)): vector}|hyperpage}
+\index{{\tt \bfseries laplace\_latent\_poisson\_2\_log\_rng }!{\tt (array[] int y, array[] int y\_index, vector x, vector theta\_init, function covariance\_function, tuple(...)): vector}|hyperpage}
 
 `vector` **`laplace_latent_poisson_2_log_rng`**`(array[] int y, array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...))`<br>\newline
 
@@ -442,8 +436,7 @@ $p(\theta \mid y, \phi)$ in the special case where the likelihood
 $p(y \mid \theta)$ is a Poisson distribution with a log link and an offset.
 {{< since 2.37 >}}
 
-<!-- vector; laplace_latent_tol_poisson_2_log_rng; (array[] int y, array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...), real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch); -->
-\index{{\tt \bfseries laplace\_latent\_tol\_poisson\_2\_log\_rng }!{\tt (array[] int y, array[] int y\_index, vector theta\_init, function covariance\_function, tuple(...), real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): vector}|hyperpage}
+\index{{\tt \bfseries laplace\_latent\_tol\_poisson\_2\_log\_rng }!{\tt (array[] int y, array[] int y\_index, vector x, vector theta\_init, function covariance\_function, tuple(...), real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): vector}|hyperpage}
 
 `vector` **`laplace_latent_tol_poisson_2_log_rng`**`(array[] int y, array[] int y_index, vector x, vector theta_init, function covariance_function, tuple(...), real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch)`<br>\newline
 
diff --git a/src/stan-users-guide/gaussian-processes.qmd b/src/stan-users-guide/gaussian-processes.qmd
@@ -486,8 +486,130 @@ model {
 }
 ```
 
+#### Poisson GP using an embedded Laplace approximation {-}
+
+For computational reasons, we may want to integrate out the Gaussian process
+$f$, as was done in the normal output model. Unfortunately, exact 
+marginalization over $f$ is not possible when the outcome model is not normal.
+Instead, we may perform *approximate* marginalization with an *embedded
+Laplace approximation* [@Rue:2009, @Margossian:2020].
+To do so, we first use the function `laplace_marginal` to approximate the marginal
+likelihood $p(y \mid \rho, \alpha, a)$ and sample the
+hyperparameters with Hamiltonian Monte Carlo sampling. Then, we recover the
+integrated out $f$ in the `generated quantities` block using 
+`laplace_latent_rng`.
+
+The embedded Laplace approximation computes a Gaussian approximation of the
+conditional posterior,
+$$
+  \hat p_\mathcal{L}(f \mid \rho, \alpha, a, y) \approx p(f \mid \rho, \alpha, a, y),
+$$
+where $\hat p_\mathcal{L}$ is a Gaussian that matches the mode and curvature 
+of $p(f \mid \rho, \alpha, a, y)$. We then obtain an approximation of
+the marginal likelihood as follows:
+$$
+  \hat p_\mathcal{L}(y \mid \rho, \alpha, a) 
+    = \frac{p(f^* \mid \alpha, \rho) p(y \mid f^*, a)}{
+    \hat p_\mathcal{L}(f \mid \rho, \alpha, a, y)},
+$$
+where $f^*$ is the mode of $p(f \mid \rho, \alpha, a, y)$, obtained via
+numerical optimization.
+
+To use Stan's embedded Laplace approximation, we must define the prior covariance
+function and the log likelihood function in the functions block.
+```{stan}
+functions {
+  // log likelihood function
+  real ll_function(vector f, real a, array[] int y) {
+      return poisson_log_lpmf(y | a + f); 
+  }
+  
+  // covariance function
+  matrix cov_function(real rho, real alpha, array[] real x, int N, real delta) {
+    matrix[N, N] K = gp_exp_quad_cov(x, alpha, rho);
+    return add_diag(K, delta)
+  }
+
+}
+```
+
+Furthermore, we must specify an initial value $f_\text{init}$ for the
+numerical optimizer that underlies the Laplace approximation. In our experience,
+we have found setting all values to 0 to be a good default.
+
+```{stan}
+transformed data {
+  vector[N] f_init = rep_vector(0, N);
+}
+```
+
+We then increment `target` in the model block with the approximation to
+$\log p(y \mid \rho, \alpha, a)$.
+```{stan}
+model {
+  rho ~ inv_gamma(5, 5);
+  alpha ~ std_normal();
+  sigma ~ std_normal();
+  
+  target += laplace_marginal(ll_function, (a, y), f_init,
+                             cov_function, (rho, alpha, x, N, delta)); 
+}
+```
+Notice that we do not need to construct $f$ explicitly, since it is
+marginalized out. Instead, we recover the GP function in `generated quantities`:
+```{stan}
+generated quantities {
+  vector[N] f = laplace_latent_rng(ll_function, (a, y), f_init,
+                                   cov_function, (rho, alpha, x, N, delta));
+}
+```
+
+Stan also provides support for a limited menu of built-in functions, including
+the Poisson distribution with a log link and an offset $a$. When using such
+a built-in function, the user does not need to specify a likelihood in the
+`functions` block. However, the user must strictly follow the signature of the
+likelihood: in this case, $a$ must be a vector of length $N$ (to allow for
+different offsets for each observation $y_i$) and we must indicate which
+element of $f$ each component of $y$ matches using the variable $y_\text{index}$.
+In our example, there is a simple pairing $(y_i, f_i)$, however we could imagine
+a scenario where multiple observations $(y_{j1}, y_{j2}, ...)$ are observed
+for a single $f_j$.
+
+```{stan}
+transformed data {
+  // ...
+  array[n_obs] int y_index;
+  for (i in 1:n_obs) y_index[i] = i - 1;
+}
 
-#### Logistic Gaussian process regression {-}
+// ...
+
+transformed parameter {
+  vector[N] a_vec = rep_vector(a, N);
+}
+
+model {
+  // ...
+  target += laplace_marginal_poisson_2_log_lpmf(y | y_index, a_vec, f_init,
+                                       cov_function, (rho, alpha, x, N, delta));   
+}
+
+generated quantities {
+  vector[N] f = laplace_latent_poisson_2_log_rng(y, y_index, a_vec, f_init,
+                                   cov_function, (rho, alpha, x, N, delta));  
+}
+
+```
+
+Marginalization with a Laplace approximation can lead to faster inference,
+however it also introduces an approximation error. In practice, this error
+is negligible when using a Poisson likelihood and the approximation works well
+for log concave likelihoods [@Kuss:2005, @Vanhatalo:2010, @Cseke:2011,
+@Vehtari:2016]. 
+Still, users should exercise caution, especially
+when trying unconventional likelihoods.
+
+#### Logistic GP regression {-}
 
 For binary classification problems, the observed outputs $z_n \in
 \{ 0,1 \}$ are binary.  These outputs are modeled using a Gaussian
@@ -514,10 +636,47 @@ data {
 // ...
 model {
   // ...
-  y ~ bernoulli_logit(a + f);
+  z ~ bernoulli_logit(a + f);
 }
 ```
 
+#### Logistic GP regression with an embedded Laplace approximation {-}
+
+As with the Poisson GP, we cannot marginalize the GP function exactly,
+however we can resort to an embedded Laplace approximation.
+
+```{stan}
+functions {
+  // log likelihood function
+  real ll_function(vector f, real a, array[] int z) {
+      return bernoulli_logit_lpmf(z | a + f); 
+  }
+  
+  // covariance function
+  matrix cov_function(real rho, real alpha, array[] real x, int N, real delta) {
+    matrix[N, N] K = gp_exp_quad_cov(x, alpha, rho);
+    return add_diag(K, delta)
+  }
+}
+
+// ...
+
+model {
+  target += laplace_marginal(ll_function, (a, z), f_init,
+                             cov_function, (rho, alpha, x, N, delta)); 
+}
+
+generated quantities {
+  vector[N] f = laplace_latent_rng(ll_function, (a, z), f_init,
+                                   cov_function, (rho, alpha, x, N, delta)); 
+}
+```
+
+While marginalization with a Laplace approximation can lead to faster inference,
+it also introduces an approximation error. In practice, this error may not be
+negligable with a Bernoulli likelihood; for more discussion see, e.g. 
+[@Vehtari:2016, @Margossian:2020].
+
 
 ### Automatic relevance determination {-}