-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NA/NaN gradient evaluation error encountered when running sdmTMB function with spatial on
#288
Comments
This is likely due to this mismatch between your installed Matrix and the Matrix used to build the version on CRAN. It affects all TMB packages on CRAN. Install from source for now. We'll push a minor update to trigger a rebuild of the binary shortly. |
I wiped out all the installed packages and then ran this script on the linux box:
Same error but different warnings |
Have you restarted your R session to ensure the latest package installs are the ones loaded? If that doesn't fix it, does a basic example with glmmTMB that has random effects run? And if that works but sdmTMB doesn't, does the GitHub version work? |
I confirm that we have tried restarting the R session. Here is the basic glmmTMB example we ran without any issue:
Finally, we are getting the same result when installing the package directly from GitHub (R session was also restarted):
|
I'm running out of ideas. I've always seen the 'rebuilding from source with the latest Matrix version'-fix work. Other information on the Matrix issue: One other option would be to install an archived version of Matrix, such as version Matrix_1.6-1.1.tar.gz: install.packages("/path/to/downloads/Matrix_1.6-1.1.tar.gz", type = "source", repos = NULL) Restart R session, then try the binary version of sdmTMB install.packages("sdmTMB") I'll get a new version of sdmTMB on CRAN shortly, which should let the binary version work. Otherwise, maybe it's something about your R algebra setup or C++ compiler Makevars? I don't see why glmmTMB would work and sdmTMB wouldn't, though, if both were built from source. The only thing I've seen cause this for models that should fit otherwise, is this Matrix issue. Everything seems to be working across all tested systems with continuous integration, including that basic example. If you post the output of |
Yeah, this is strange. It is surprising that the error was reproduced on our end across two separate installs (windows and ubuntu) and the unit tests are running fine. I tried the above suggestion (i.e., installation of Matrix 1.6-1.1 from zipped tarball) and this did not work either. Here is the output from
I'll also see if I can get some of my more R-savvy colleagues here at GFC to try and reproduce the issue. |
It's possible it's related to the libopenblasp here and the more usual Matrix version issue on the Windows machine. I believe I would have the same error on continuous integration without this line: sdmTMB/.github/workflows/R-CMD-check.yaml Line 87 in 59e4072
Regardless, the best path forward is for me to bump the version on CRAN to build a new binary, which I will prioritize doing in the next day or so. If that doesn't solve things, I'll fire up a Docker image and see if I can debug with that BLAS/LAPACK setup. |
Ok great. Thanks for your help with troubleshooting this. |
OK, version 0.4.2 is now on CRAN. The Mac binaries are built. The Windows binaries will probably be built in the next day or so. It occurs to me now that I don't know how Linux and CRAN interact. Maybe they don't build binaries for you? |
Sorry, still not working. I tried it on a clean install and I installed the packages as such:
All of the packages are installed from source. I think you are correct that binaries are not built for Linux users; at least not with the way our machine is set up. Here is the session info:
I will try on my windows computer once the binaries are available. |
Fresh install on windows and I ran into the same error. I also had a colleague do this on their windows PC and they got the same error. We are both running R 4.2.2
|
I just confirmed that the following works on my DFO Windows laptop with several recent Matrix and TMB versions: library(sdmTMB)
m <- sdmTMB(
data = pcod,
formula = present ~ depth_scaled + depth_scaled2,
mesh = make_mesh(pcod, c("X", "Y"), cutoff = 10),
family = binomial(link = "logit"),
spatial = "on"
) but, the Matrix version above is very old (Matrix_1.5-1 2022-09-13) and may not be compatible with TMB 1.9.10 (depending on if it was built from source?). This breaking Matrix ABI change has been a big pain. Can you confirm the following still does not work for you given current Matrix and TMB packages? install.packages("Matrix")
install.packages("TMB")
install.packages("sdmTMB")
# restart R / RStudio to be safe... then
library(sdmTMB)
m <- sdmTMB(
data = pcod,
formula = present ~ depth_scaled + depth_scaled2,
mesh = make_mesh(pcod, c("X", "Y"), cutoff = 10),
family = binomial(link = "logit"),
spatial = "on"
) CRAN checks seem fine and all binaries (except 'patched' linux) are built. Hopefully it's an issue with old Matrix... |
When I do the above, it works on the Windows computer! Unfortunately, still no luck on the Linux computer. When my colleague first tried this on the DFO computer:
it worked because he had several dependencies already installed. However after wiping out the |
Hi, just chiming in to add my support for finding a resolution to using sdmTMB on a Linux computer. |
@JoleneSutton can you provide more details? Installed from CRAN? Installed from source or binary? GitHub? Matrix and TMB up to date? Can you post the output of sessionInfo()? Anything in your R Makevars file? There's nothing inherent to Linux systems about why this should happen. I regularly use the package on Linux systems, it's tested on 3 Linux systems with every push to GitHub, and the CRAN servers test it on many Linux systems. I'd like to get to the bottom of this! It's likely something about a specific setup and maybe with multiple data points we can track this down. |
Hi @seananderson , yes, sorry I should have been more clear. It is the same machine and thus error messages as described by @davjfish. I'm just hoping to be able to switch my scripts to that machine in order to free up my laptop. We still seem to be having issues with Linux, per the post from Jan. 22. Really appreciate all your help with this! |
I just spent a while debugging this with someone (with raw TMB/RTMB code, nothing to do with sdmTMB) who also had R version 4.2.2 installed and even installing Matrix and TMB from source in that order did not fix it (edit: it did fix it, but TMB had to built from source and R had to be restarted). |
It is still highly likely that the issue is an old Matrix package install. I see above that the installed version of Matrix is old. Current version is 1.6-5. Even for that person with R 4.2.2 I mentioned earlier today, once they installed the latest Matrix, then installed TMB from CRAN from source, the problem fixed itself. In this case (with an older R), you likely then also have to install sdmTMB from source. I can post some RTMB code that could be run to simplify testing a bit by eliminating the sdmTMB layer. |
We upgraded to R 4.3.2 on the Linux, and installed the updated packages, but unfortunately we are still having the same issues. Here's the code:
Here's the error message: And the session info:
Matrix products: default locale: time zone: America/Halifax attached base packages: other attached packages: loaded via a namespace (and not attached): |
I'm running out of ideas. You can confirm these built from source and were not installed from binaries? install.packages("Matrix")
install.packages("TMB")
install.packages("sdmTMB") I wondered if it could be the BLAS/LAPACK setup, but I just found someone with the same versions as you and it works for them. Again, you're sure the above installed from source? As a troubleshooting exercise, does the following code run for you on this server down to the sdmTMB part? i.e., down to line 93 or so. Then we can isolate if this is an sdmTMB install issue or a more fundamental TMB issue. |
This issue is still persistent on a fresh install in Ubuntu 22. I tried installing everything from source and ran into the same NA/Nan gradient / matrix not positive definite errors. I also tried a clean install duplicating the steps in the passing github action workflow without any luck. I tried the troubleshooting exercise and it crashes out on line 90 with the same type of error: > opt <- nlminb(obj$par, obj$fn, obj$gr)
Error in .local(A, ...) :
leading principal minor of order 405 is not positive
In addition: Warning message:
In .local(A, ...) :
CHOLMOD warning 'matrix not positive definite' at file 'Supernodal/t_cholmod_super_numeric_worker.c', line 1114
Error in .local(A, ...) :
leading principal minor of order 405 is not positive
In addition: Warning messages:
1: In nlminb(obj$par, obj$fn, obj$gr) : NA/NaN function evaluation
2: In .local(A, ...) :
CHOLMOD warning 'matrix not positive definite' at file 'Supernodal/t_cholmod_super_numeric_worker.c', line 1114
Error in ff(x, order = 1) :
inner newton optimization failed during gradient calculation
outer mgc: NaN
Error in nlminb(obj$par, obj$fn, obj$gr) : NA/NaN gradient evaluation
> |
@stoyelq is this on the same server as above or a different Ubuntu setup? If it's different then maybe we can figure out what's in common? This shouldn't be a general problem with Ubuntu 22 + sdmTMB or Ubuntu + openBLAS + sdmTMB. Both are regularly tested and used without issue (here on GitHub Actions, on CRAN, by me personally, and by many others). There must be something about this specific system setup. Probably the best hope of solving this is with Docker. If someone can reproduce the problem on Docker and point me to the dockerfile then I can build it and troubleshoot. It's also worth confirming if this is something unique to sdmTMB or if this happens with other TMB random effects models built locally. E.g., starting with a basic random effects model such as 'thetalog.R', and if that works, also trying an SPDE spatial model as in 'spde.R'. Both are in this examples folder: https://github.com/kaskr/adcomp/blob/master/tmb_examples/ |
When working through this demo on a new computer and a fresh install of R (4.3.2), we are running into the following issue:
Produces this error:
When
spatial
is set tooff
, we do not get this error. Originally, we suspected this was a problem with running the library on Linux but we have since reproduced this on Windows. This error has also been reproduced on R version 4.2.2 . The error message is the same on Linux but we do receive a few extra warnings:The text was updated successfully, but these errors were encountered: