Are tweedie models more computationally heavy than alternatives? #301
Replies: 2 comments
-
Fascinating... I think it's understandable that the Tweedie is slower and more memory hungry given the computational challenges in approximating the likelihood, but in testing this I noticed that glmmTMB scales fitting speed (and memory I think) better with large datasets and the Tweedie. E.g. library(sdmTMB)
library(tictoc)
# match glmmTMB:
ctl <- sdmTMBcontrol(newton_loops = 0, multiphase = FALSE)
N <- 3e4
set.seed(1)
dat <- data.frame(
y = fishMod::rTweedie(N, 1, 1, 1.5)
)
tic()
m2 <- sdmTMB(
y ~ 1,
data = dat,
spatial = 'off',
family = tweedie("log"),
control = ctl
)
toc()
#> 6.429 sec elapsed
tic()
m1 <- glmmTMB::glmmTMB(
y ~ 1,
data = dat,
family = glmmTMB::tweedie("log")
)
toc()
#> 1.433 sec elapsed
# Gaussian ----------------------------
N <- 3e4
set.seed(1)
dat <- data.frame(
y = rnorm(N, 0, 1)
)
tic()
m2 <- sdmTMB(
y ~ 1,
data = dat,
spatial = 'off',
control = sdmTMBcontrol(newton_loops = 1, multiphase = FALSE)
)
toc()
#> 0.116 sec elapsed
tic()
m1 <- glmmTMB::glmmTMB(
y ~ 1,
data = dat
)
toc()
#> 0.219 sec elapsed Created on 2024-02-15 with reprex v2.1.0 I'm not sure what's going on. My first thought was that maybe a single set of epsilon/omega random effects that are left mapped off at zero might be responsible, but when I try shrinking those to have dimensions of 0 it doesn't change much. I don't see a similar scaling problem with the Gaussian. Actually, maybe it's there but not until much larger data sizes and not as badly: library(sdmTMB)
library(tictoc)
N <- 3e6
set.seed(2)
dat <- data.frame(
y = rnorm(N, 0, 1)
)
ctl <- sdmTMBcontrol(newton_loops = 0, multiphase = FALSE)
tic()
m2 <- sdmTMB(
y ~ 1,
data = dat,
spatial = 'off',
control = ctl
)
#> Warning: The model may not have converged. Maximum final gradient:
#> 0.0594763641440051.
toc()
#> 13.274 sec elapsed
tic()
m1 <- glmmTMB::glmmTMB(
y ~ 1,
data = dat
)
toc()
#> 7.489 sec elapsed Created on 2024-02-15 with reprex v2.1.0 There are gradient issues with the Gaussian big data example there. I wonder if the optimizer settings are different? Or starting values are better? Or if an internal parameter transformation is different? Or if it's because of extra parameters that are mapped off? I'll move this over to an issue. |
Beta Was this translation helpful? Give feedback.
-
This is now fixed. The problem was I was ADREPORTing the Tweedie p parameter within a loop over the data and apparently that results in many ADREPORTs and blows up the memory. Thanks for reporting this! 616e8cd |
Beta Was this translation helpful? Give feedback.
-
Hi!
Me and @VThunell have played around with Tweedie models fitted to stomach content (continious response and about 20% zeros). The dataset is quite large: approx 100.000 observations and 50 years or so. In doing this, we found that a Tweedie model would quite often lead to
Error: vector memory exhausted (limit reached?)
or other types of memory-issues causing R to crash, even on smaller subsets of the data.First we explored general memory settings in R, and then packages (e.g., cran/dev versions, TMB, Matrix, INLA etc.) and even R versions, but didn't find anything there. We have also tried this across 3 Mac OS Sonoma 14.3 laptops: Intel, m2 and m3 chips, with 8, 24 and 8 GB of ram. On my laptop (the 24 gb ram), I can get away with the biggest subset of data.
Here are some examples that hopefully reproduce for you.
The data can be found here:
But we can also reproduce it by modifying the pcod example so that is of similar size.
The code below illustrates the issue for me, but on different laptops you'll find different threshold where the model crashed. The Tweedie model fitted to the big data crashes, the delta_lognormal works fine and fast, and the last model is half the data fitted with Tweedie and it works.
I also note that when I do
saveRDS()
on the delta_lognormal and the last tweedie, the difference in size is huge -- 1.1 MB and 76 MB! Even though the delta_lognormal has two models and double the data.Is this working as intendend? Is it something with the Tweedie that makes the model very big/slow?
Here's the session I ran this in (though as mentioned this was reproduced with other versions as well)
*edited for spelling
Beta Was this translation helpful? Give feedback.
All reactions