-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments on JOSS paper #296
Comments
This seems to be OK in version generated by the review system itself (I think I have something wrong in the YAML meta data for the the |
Further to our conversation, some thoughts on the improper prior issue: # get some data
library(mgcv)
set.seed(2)
dat <- gamSim(1,n=400,dist="normal",scale=2)
# tprs has a null space dimension of 1 -- cool
gam(y~s(x0), data=dat)$smooth[[1]]$null.space.dim
# ts has a null space dimension of 0 -- also cool
gam(y~s(x0, bs="ts"), data=dat)$smooth[[1]]$null.space.dim
# what about B-splines with a continuous penalty? hmmm
gam(y~s(x0, bs="bs"), data=dat)$smooth[[1]]$null.space.dim
# now using code from the paper ;)
library(gratia)
library(dplyr)
library(ggplot2)
# basis get
bfs <- basis(gam(y~s(x0, bs="bs"), data=dat), select = "s(x0)")
# plot basis functions and fitted spline
bfs |>
ggplot(aes(x = x0, y = .value)) +
geom_line(aes(colour = .bf)) +
guides(colour = "none") +
labs(x = "x", y = "y", title = "s(x)") Well that gives us this: which I don't see any "linear" terms in... ¯_(ツ)_/¯ Going back to the boring situation in which we "read the documentation", in
So I think this is before identifiability constraints are applied (since they need to be applied at the model, not term level. So I think the nullspace just consists of the intercept in that case. Doing a tiny bit more digging: jagam(y~s(x0, bs="bs"), data=dat, centred=FALSE, file="m1.jags")
jagam(y~s(x0, bs="bs")-1, data=dat, centred=FALSE, file="m0.jags") Produces:
So I think that makes sense? (Running to stand in front of my poster so I will think about this more in a wee bit!) |
But for that to be true, wouldn't the null space dimension for a TPRS (the default) be = 2 because it has both constant and linear basis functions, yet as you show it is 1. If we do library("mgcv")
library("gratia")
df <- data_sim("gwf2", seed = 2)
m <- gam(y ~ s(x, bs = "cr"), data = df)
# extract penalty
S <- m$smooth[[1]]$S[[1]]
# eigen decompoisition
eigen(S) producing
where 1 eigenvalue is numerically 0 and the matrix is not invertible by normal means. $> solve(S)
Error in solve.default(S) :
system is computationally singular: reciprocal condition number = 4.66992e-19 And this is what I would expect, because a linear function is still in the span of basis, even if there is no actual linear function. Now consider the same model but with m <- gam(y ~ s(x, bs = "cr"), data = df, select = TRUE) Now the null space of this smooth is 0, as it should be because of the extra penalty:
Some more discussion is needed here... |
Some further reading has helped, if we're just talking about the Null space of a penalty. The key point I think is that the null space is the span of functions representable by the basis for which the penalty has no effect (and hence it's size is the number of such representable functions that are unaffected by the penalty). The null space doesn't refer to specific basis functions but to the functions representable by that basis. Section 5.4.3 of Simon's GAM book has: which I think is clear. This explains the findings in my comment above. I have assumed that the rank deficiency of |
Here's some clearer evidence that the smooths have a null space. This needs the very latest version of gratia from github but equivalent results can be achieved with We can rotate the basis functions using an eigendecomposition to reparameterize the smooth such that it has a diagonal (identity) penalty. The null space is then highlighted because the diagonal elements are zero for the null space. remotes::install_github("gavinsimpson/gratia")
library("gratia")
library("patchwork")
df <- data_sim("gwf2", seed = 2)
p1 <- basis(s(x, bs = "bs"), data = df,
constraints = TRUE, diagonalize = TRUE) |>
draw()
p2 <- penalty(s(x, bs = "bs"), data = df,
constraints = TRUE, diagonalize = TRUE) |>
draw()
p1 + p2 + plot_layout(ncol = 2, nrow = 1) Here, we're using a B-spline basis and have absorbed the sum-to-zero identifiability constraint into the basis and then reparameterized to diagonalize the penalty. Absorbing the identifiability constraint means the constant / zero function is not in the span of the basis. But the linear functions remain in the span of the basis. This is clear from the reparametrisation shown above, where we see an actual linear basis function and a zero row/column in the penalty matrix. The reparameterisation doesn't change anything with respect to the original basis, it just highlights the null space of the basis more clearly. |
After some extensive chat here, online and offline (by nice coincidence at the same conference), we've got somewhere on this (the improper prior business)... I'll write this collectively for both our findings but if @gavinsimpson needs to correct me anywhere, I trust that will happen.
Does that cover everything @gavinsimpson? |
Then two additional comments because @gavinsimpson mistakenly asked me to look more closely at his equations...
(this list interrupted by my realising that github now renders
|
(Part of review here.)
This was fun to read and I learnt some things :)
Some minor statistical/editorial comments on the paper:
gratia
github. Not sure if this is normal.twlss()
model), there are more parameters to consider. You might not want to address this in the intro -- that's fine!gratia
.ctrl <- gam.control(nthreads = 10)
?\varphi
rather than\phi
for the scale parameter (as you did in the intro), not sure if this was intentional?fitted_samples()
to" misplaces comma.brms
" but "Brms" looks weird.The text was updated successfully, but these errors were encountered: