-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in chol.default(iU) : the leading minor of order 2 is not positive definite #45
Comments
I cannot reproduce this. What is the error message you get? What does |
Thank you for you help and sorry for the incomplete posting. Here is the output from traceback(). I will drop in the data files so it should be reproducable.
Computing chain 1
|
I can confirm the occurrence of the error. This seems to happen because your data ( |
Ahh..yes I should have done log(x+1). Thank you. |
Let me also note that if your transform count data as log(x+1), then you
probably wish to use the normal model. If you don't transform it, then
you may use the Poisson or over-dispersed Poisson models that include
the log-link function.
Otso
…On 14.6.2020 19.00, Rhett Harrison wrote:
Ahh..yes I should have done log(x+1). Thank you.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#45 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEIYMZVJOKUUOZGTVX5TJLLRWTXZJANCNFSM4N46EPOQ>.
|
The real message is that you should inspect your data before going to fancy analyses. Some simple descriptive statistics and plots of data would have shown that there is something fishy in the data. No offence intended: we all have committed this sin. It is so hard to remember, but we really must check data. Not only in the beginning but also on the way. |
Again I am sorry. Actually I did check it using hist(log(x)), but the hist() function ignores -inf values. I guess you should always run an is.infinite() and is.na() check. |
My favorite check is sum(). It not only shows if there are NA:s or -Inf
values, but also if the values are accidentally e.g. strings, which
happens quite often. Almost half of "Hmsc returns an error message"
problems have been because e.g. sum(m$Y) does not return a number. Note
however that NA values are allowed for Y (but not for X nor Tr).
Otso
…On 15.6.2020 9.17, Rhett Harrison wrote:
Again I am sorry. Actually I did check it using hist(log(x)), but the
hist() function ignores -inf values. I guess you should always run an
is.infinite() and is.na() check.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#45 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEIYMZX4WQWHAWQBOEHR2N3RWW4GXANCNFSM4N46EPOQ>.
|
A funny thing with |
Hi I encountered a similar problem although a bit more obscure. I run a large analysis were I model p/a and count data separately. For the count data, I model counts conditional on presence, setting all 0s to NAs. I log-transform the counts and model them using the normal distribution. There are no Inf or -Inf values in the abundance matrix. The p/a model runs without problem. The count model ran (10 chains in parallel, 250 samples, thin = 10) but a longer sampling (thin = 100) runs for 6h or so and then stops with the error:
I tried changing the seed but the same thing happened again. Any idea on how to go about debugging this? One thing is that the species matrix contains also rare species so that not all species might be present for all fixed factors. Can that be a problem? any help appreciated! |
The rare species shouldn't cause a problem (you just cannot estimate those factor coefficients) but you could check by running the model on a smaller set of only common species. An abundance model run on rare species does not usually produce informative output anyway. You can just do Y[Y<10] <- NA .
Get Outlook for Android<https://aka.ms/ghei36>
…________________________________
From: FabianRoger <notifications@github.com>
Sent: Wednesday, February 10, 2021 5:36:44 PM
To: hmsc-r/HMSC <HMSC@noreply.github.com>
Cc: Harrison, Rhett (ICRAF) <R.Harrison@cgiar.org>; Author <author@noreply.github.com>
Subject: Re: [hmsc-r/HMSC] Error in chol.default(iU) : the leading minor of order 2 is not positive definite (#45) Newsletter / Marketing
Hi
I encountered a similar problem although a bit more obscure. I run a large analysis were I model p/a and count data separately. For the count data, I model counts conditional on presence, setting all 0s to NAs. I log-transform the counts and model them using the normal distribution.
There are no Inf or -Inf values in the abundance matrix. sum(m$Y, na.rm = TRUE) gives a positive value.
The p/a model runs without problem. The count model ran (10 chains in parallel, 250 samples, thin = 10) but a longer sampling (thin = 100) runs for 6h or so and then stops with the error:
Error in checkForRemoteErrors(val) : 8 nodes produced errors; first error: the leading minor of order 23 is not positive definite
I tried changing the seed but the same thing happened again.
Any idea on how to go about debugging this?
One thing is that the species matrix contains also rare species so that not all species might be present for all fixed factors. Can that be a problem?
any help appreciated!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#45 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AKK6YHZUX5DF5UCMYIKKIZ3S6KRYZANCNFSM4N46EPOQ>.
|
Thanks for chiming in! Are you suggesting to set all observations of <10 at a given site to 0? I paste the histogram of the count data (presence only) below, counts of 1 (0 in ln) is by far the most common, and most species aren't observed in abundances > 10 at any site (it's line-counts (1km) of birds - you wouldn't expect high abundance for anything but a few flocking species...) note that the histogram below shows the natural log of the count data, across all sites, years and species. But given that p/a might actually be the way to go... |
Hello again,
Sorry I made a mistake. You should remove all species with <10 in total. Even that may be few perhaps increase to 20.
Y <- Y[ , apply(Y,2, sum, na.rm =T) > 19] will remove the species with too few individuals (remember you need to do this for your site dataset, trait and phylogeny too).
However, looking at your abundance distribution - you cannot specify an abundance model (in any framework) using abundances that run from 0-3. The Probit model is the appropriate one. However, you have a very large number of observations, so you could potentially lump sites / observations in some logical way and then try specifying a Poisson or Lognormal Poisson.
Rhett
Rhett D. Harrison
Senior Researcher, Landscape Ecology and Conservation
World Agroforestry Centre, East & Southern Africa Region,
13 Elm Road, Woodlands, Lusaka, ZAMBIA
E: r.harrison@cgiar.org | M: +260 972449448 | S: rhett_d_harrison |
The World Agroforestry Centre is a member of the CGIAR Consortium and a CarbonNeutral® organisation.
…________________________________
From: FabianRoger <notifications@github.com>
Sent: 10 February 2021 23:51
To: hmsc-r/HMSC <HMSC@noreply.github.com>
Cc: Harrison, Rhett (ICRAF) <R.Harrison@cgiar.org>; Author <author@noreply.github.com>
Subject: Re: [hmsc-r/HMSC] Error in chol.default(iU) : the leading minor of order 2 is not positive definite (#45) Newsletter / Marketing
Thanks for chiming in! Are you suggesting to set all observations of <10 at a given site to 0? I paste the histogram of the count data (presence only) below, counts of 1 (0 in ln) is by far the most common, and most species aren't observed in abundances > 10 at any site (it's line-counts (1km) of birds - you wouldn't expect high abundance for anything but a few flocking species...)
note that the histogram below shows the natural log of the count data, across all sites, years and species.
But given that p/a might actually be the way to go...
[image]<https://user-images.githubusercontent.com/5137092/107576028-e8622000-6bf0-11eb-927e-41f6c7ff53ea.png>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#45 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AKK6YH3UHBZVOJYJMGOMATTS6L5U5ANCNFSM4N46EPOQ>.
|
If someone comes here to find a solution: it was provided to me by @oysteiop. The problem was in the spatial random factor and the solution is to limit the maximum number of latent variables that the model should try to fit. in my case, I set |
@FabianRoger this may have fixed your case, but it certainly did not solve the problem. |
Fair point. However I am not even sure it fixed my case, as, upon investigation, while the model converged all variance was now explained by the spatial random factor... So something seems to have gone wrong anyway. I will work with the p/a model for now and come back to this if I can. |
Hi,
Here the model: https://www.dropbox.com/s/vze2jo5ipo5rp16/mod_not_working.rds?dl=0 As you will see I don't have infinite values, my x and y sums to numbers... I am sure it is not due to any format issue. The same model without the variable VDeadSma is working fine (and correlations among original data are neither an issue: https://www.dropbox.com/s/0aq7c13iroc1skg/notworking.tiff?dl=0). What am I missing? |
Hi again, |
@stephanJG the message comes from |
Hey guys! When I fitted a presence-absence model with spatial random effects without setting the initPar argument to “fixed effects”, my model ran without any problem (See attached Screenshot_1). Yet, when I ran the same model with the initPar argument set to “fixed effects”, I am having the same issue as @stephanJG (see attached Screenshot_2). The traceback() function show the following (see attached Screenshot_3) There are no either NAs nor Inf / -Inf values in the presence-absence matrix. I attempted the quick fix suggested by @FabianRoger to a similar issue. But this produced the same error. Could the issue I am having be related to @jarioksa 's above explanation for the warning message gave by the glm.fit() function? What else should I check? Could you please help me figuring out this issue? |
@cmontalvo-mancheno This should have no connection to |
@jarioksa Thank you very much for your response and suggestion! I tried what you suggested with all my data (n = 688), and this is what I got after tracing back the error message (please see screenshot_A1) I also tried a couple of things: 1) removing at random one observation so my data have 687 rather than 688 observations, 2) replacing one observation by another so my data still have 688, and 3) selecting at random 200 observations. Interestingly, in all three cases, the Yet, for the last two cases there were the following warning messages (screenshot_A3 is for the 2nd case): I do not understand why sometimes I get an error, and sometimes I only get either one or both warning messages. Also, I wonder if these warning messages will have any implication in my models? |
Hi Everyone! Nice to see I'm not the only one having this problem...
|
Hi,
Errors that relate to matrices being not positive definite relate to numerical problems that can push covariance matrices that are by definition always positive (semi)definite “over the edge” so that they are not positive definite anymore in which case e.g. Cholesky decomposition gives an error. These are difficult to fix because the model can run fine 100000 iterations and then suddenly the error comes, the likelihood of the error depending on the model and data. I guess we should use more the “try” function with the Cholesky’s and add eps*Identity and/or omit the iteration round if it fails (Jari & Gleb may have better ideas of this).
One typical case is spatial modelling where some locations are very close to each other and some others very far from each other, so I would recommend the ration between the largest and smallest distance not to be greater than something like 100. In your case the problem is not with space but perhaps it relates to the fact that you have included a very large number of covariates both for traits and covariates, so that the model might be quite much overparameterized. Even without the error message, I would probably recommend simplifying the model quite a bit.
Otso
From: Phoebe-AA ***@***.***>
Sent: perjantai 1. lokakuuta 2021 18:15
To: hmsc-r/HMSC ***@***.***>
Cc: Ovaskainen, Otso T ***@***.***>; Comment ***@***.***>
Subject: Re: [hmsc-r/HMSC] Error in chol.default(iU) : the leading minor of order 2 is not positive definite (#45)
Hi Everyone! Nice to see I'm not the only one having this problem...
I have 2 datasets (Abundance and Biomass). I am able to run the abundance dataset using the code below however the biomass dataset has the error code:
Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: the leading minor of order 9 is not positive definite
traceback()
5: stop(count, " nodes produced errors; first error: ", firstmsg,
domain = NA)
4: checkForRemoteErrors(val)
3: dynamicClusterApply(cl, fun, length(x), argfun)
2: clusterApplyLB(cl, 1:nChains, fun = sampleChain)
1: sampleMcmc(model, thin = thin, samples = samples, transient = transient,
nChains = nChains, nParallel = nChains, verbose = verbose,
alignPost = TRUE)
…________________________________
The abundance dataset has 17 stations and biomass has 8 stations. Both work when I add a spatial random level (XY coords) but 1) I'm not interested in spatial aspects as such and 2) with the spatial random effect it makes the model more difficult to converge and therefore more time consuming (minimum 1-2 days for running the model, not including cross-validation). Anyway, here is the code I am using:
studyDesign <- data.frame(sample= as.factor(rownames(Env)), habitat = as.factor(Env$Habitat))
rL1 <- HmscRandomLevel(units = studyDesign$sample)
rL2 <- HmscRandomLevel(units = studyDesign$habitat)
XFormula <- ~ poly(Depth, degree = 2, raw = TRUE) +
poly(Oxygen, degree = 2, raw = TRUE) +
poly(B.Temp, degree = 2, raw = TRUE) +
poly(Flourescence, degree = 2, raw = TRUE) +
poly(B.PSU, degree = 2, raw = TRUE) +
poly(Turbidity, degree = 2, raw = TRUE) + SIC
TrFormula<- ~S1+ S2+ S3+ S4+ S5+ BF1+ BF2+ BF3+ BF4+ BF5+SK1+ SK2 +SK3+SK4+SK5+
R1+R2+ R3+ R4+ LD1+ LD2+ LD3+ MV1 +MV2+ MV3+ MV4 +MO1 + MO2 + MO3 + MO4 +
FH1 + FH2 + FH3 + FH4 + FH5+FH6+ Z1+ Z2+ Z3+ Z4
model <- Hmsc(Y=logY,YScale = TRUE, XData = XData, XFormula = XFormula,
TrData = TrData, TrFormula = TrFormula,
phyloTree = my.tree,
studyDesign = studyDesign,
ranLevels = list(sample = rL1, habitat = rL2), distr = "normal")
set MCMC parameters
nChains = 4
thin = 15
samples = 200
transient = 1000
verbose = 100
model = sampleMcmc(model, thin = thin, samples = samples,
transient = transient,
nChains = nChains, nParallel = nChains,
verbose = verbose, alignPost = TRUE)
Habitat as a random factor works but it's specifically when I put sample = rL that I get the above error. Any ideas on how to solve this? I can share the code if needs be.
Many thanks,
Phoebe
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#45 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEIYMZRBYXFYD4MFP5A5NG3UEXF5RANCNFSM4N46EPOQ>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Otso gave out the main point. We have already added some code that check the input to avoid clear problem cases such as duplicated spatial locations. Probably we need to add some more checks, but it is hard to know what else should be checked and how. We call |
Thanks @ovaskain and @jarioksa for your messages. In my three cases, the presence-absence spatial model has no duplicated locations, and their distance range from ~0.85 km to ~385 km. Each of the locations corresponds to the centroid of a 1 km by 1 km grid cell, which I used to record the presence-absence of eight virtual species. The coordinates are in longitude and latitude, so when I defined the spatial random effect, I set Replacing one location allows me to run
|
@ovaskain and @jarioksa thanks for your messages. This is clear about the error message. Also, I had an idea that the models might be over parameterised as I was also getting very low predictive power (whether it was 2-fold or loo validation). Ok, I will start with losing some traits and covariates. Many thanks! Phoebe |
Hi @jarioksa! I've also been running into these chol()-related errors quite a bit lately, and at least in my case they happen inside updateBetaLambda(), when I include covariate-dependent latent variables: (with phylogeny:) (without phylogeny:) Best regards, Anna |
Hello again, Cross-validation, fold 1 out of 2
This happens as soon as I run the code, so I'm guessing it must be something in my model structure, or because there are negative values in my model? |
See issue #123 for some ideas to recover from errors – or to avoid errors. |
Ok will do. Thank you, Phoebe |
Hello,
I am getting the above error when running sampleMcmc (after "Computing the first chain"). My community is a set of 7 tree species measured with basal area and 13 beetle species whose distribution best fits a lognormal poisson sampled from 22 plots. When I replace with a pa dataset and distr = "probit", the problem disappears. Any idea how I might get round this problem for the abundance dataset? I was also previously running models on just the beetle dataset fine.
This is my code setting up the model:
The text was updated successfully, but these errors were encountered: