Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in chol.default(iU) : the leading minor of order 2 is not positive definite #45

Open
rhettdharrison opened this issue Jun 13, 2020 · 31 comments
Labels
help wanted Extra attention is needed

Comments

@rhettdharrison
Copy link

rhettdharrison commented Jun 13, 2020

Hello,
I am getting the above error when running sampleMcmc (after "Computing the first chain"). My community is a set of 7 tree species measured with basal area and 13 beetle species whose distribution best fits a lognormal poisson sampled from 22 plots. When I replace with a pa dataset and distr = "probit", the problem disappears. Any idea how I might get round this problem for the abundance dataset? I was also previously running models on just the beetle dataset fine.

This is my code setting up the model:

## set model parameters
set.seed(1)
sample.id <- rownames(com)
studyDesign = data.frame(sample = sample.id)
rL = HmscRandomLevel(units = studyDesign$sample)
xData <- plot.char
distr = c(rep("normal",7), rep("lognormal poisson",13))

#set model
m.comNULL <- Hmsc(Y = com, XData = xData, XFormula = ~1,
             distr = distr, studyDesign = studyDesign, ranLevels = list("sample" = rL))
@jarioksa
Copy link
Collaborator

I cannot reproduce this. What is the error message you get? What does traceback() show after the error (this will tell us where this error happened)? Can you provide a minimal test case that triggers this error?

@rhettdharrison
Copy link
Author

Thank you for you help and sorry for the incomplete posting.

Here is the output from traceback(). I will drop in the data files so it should be reproducable.

###Run mcmc sampling to estimate parameters
m.comNULL.ab = sampleMcmc(m.comNULL.ab, thin = thin, samples = samples, transient = transient,

  •                    nChains = nChains, nParallel = , verbose = verbose)
    

Computing chain 1
Error in chol.default(iU) :
the leading minor of order 2 is not positive definite

traceback()
5: chol.default(iU)
4: chol(iU)
3: updateBetaLambda(Y = Y, Z = Z, Gamma = Gamma, iV = iV, iSigma = iSigma,
Eta = Eta, Psi = Psi, Delta = Delta, iQ = iQg[, , rho], X = X,
Tr = Tr, Pi = Pi, dfPi = dfPi, C = C, rL = hM$rL)
2: sampleChain(chain)
1: sampleMcmc(m.comNULL.ab, thin = thin, samples = samples, transient = transient,
nChains = nChains, nParallel = , verbose = verbose)

plot_characteristics.txt
community.txt

@jarioksa
Copy link
Collaborator

I can confirm the occurrence of the error. This seems to happen because your data (com) has -Inf (minus infinite) values and we cannot do any meaningful mathematics with them (look, for instance, what you get with colMeans(com)). I guess you have log-transformed your data, and log(0) gives -Inf.

@rhettdharrison
Copy link
Author

Ahh..yes I should have done log(x+1). Thank you.

@ovaskain
Copy link
Collaborator

ovaskain commented Jun 14, 2020 via email

@jarioksa
Copy link
Collaborator

The real message is that you should inspect your data before going to fancy analyses. Some simple descriptive statistics and plots of data would have shown that there is something fishy in the data. No offence intended: we all have committed this sin. It is so hard to remember, but we really must check data. Not only in the beginning but also on the way.

@rhettdharrison
Copy link
Author

Again I am sorry. Actually I did check it using hist(log(x)), but the hist() function ignores -inf values. I guess you should always run an is.infinite() and is.na() check.

@ovaskain
Copy link
Collaborator

ovaskain commented Jun 15, 2020 via email

@jarioksa
Copy link
Collaborator

A funny thing with Inf is that it works in some calculations, but gives NaN (not a number) in some others. So exp(-Inf) == 0 and Inf+Inf == Inf, but Inf-Inf == NaN. In this case most numbers going to chol() were NaN and this was reported as "the leading minor of order 2 is not positive definite". Not a very user-friendly error message.

@jarioksa jarioksa added the help wanted Extra attention is needed label Jun 19, 2020
@FabianRoger
Copy link

Hi

I encountered a similar problem although a bit more obscure. I run a large analysis were I model p/a and count data separately. For the count data, I model counts conditional on presence, setting all 0s to NAs. I log-transform the counts and model them using the normal distribution.

There are no Inf or -Inf values in the abundance matrix. sum(m$Y, na.rm = TRUE) gives a positive value.

The p/a model runs without problem. The count model ran (10 chains in parallel, 250 samples, thin = 10) but a longer sampling (thin = 100) runs for 6h or so and then stops with the error:

Error in checkForRemoteErrors(val) : 8 nodes produced errors; first error: the leading minor of order 23 is not positive definite

I tried changing the seed but the same thing happened again.

Any idea on how to go about debugging this?

One thing is that the species matrix contains also rare species so that not all species might be present for all fixed factors. Can that be a problem?

any help appreciated!

@rhettdharrison
Copy link
Author

rhettdharrison commented Feb 10, 2021 via email

@FabianRoger
Copy link

Thanks for chiming in! Are you suggesting to set all observations of <10 at a given site to 0? I paste the histogram of the count data (presence only) below, counts of 1 (0 in ln) is by far the most common, and most species aren't observed in abundances > 10 at any site (it's line-counts (1km) of birds - you wouldn't expect high abundance for anything but a few flocking species...)

note that the histogram below shows the natural log of the count data, across all sites, years and species.

But given that p/a might actually be the way to go...

image

@rhettdharrison
Copy link
Author

rhettdharrison commented Feb 11, 2021 via email

@FabianRoger
Copy link

If someone comes here to find a solution: it was provided to me by @oysteiop. The problem was in the spatial random factor and the solution is to limit the maximum number of latent variables that the model should try to fit.

in my case, I set rL1$nfMax = 4 (rL1 is my spatial random factor) and the model ran without error.

@jarioksa
Copy link
Collaborator

jarioksa commented Mar 4, 2021

@FabianRoger this may have fixed your case, but it certainly did not solve the problem.

@FabianRoger
Copy link

Fair point. However I am not even sure it fixed my case, as, upon investigation, while the model converged all variance was now explained by the spatial random factor...

So something seems to have gone wrong anyway. I will work with the p/a model for now and come back to this if I can.

@stephanJG
Copy link

Hi,
I am having a similar issue.

Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: the leading minor of order 2 is not positive definite

Here the model: https://www.dropbox.com/s/vze2jo5ipo5rp16/mod_not_working.rds?dl=0

As you will see I don't have infinite values, my x and y sums to numbers... I am sure it is not due to any format issue. The same model without the variable VDeadSma is working fine (and correlations among original data are neither an issue: https://www.dropbox.com/s/0aq7c13iroc1skg/notworking.tiff?dl=0).

What am I missing?
Many thanks in advance.
Jörg

@stephanJG
Copy link

Hi again,
Update: I now fitted the model without initPar = "fixed effects" after I saw the warnings() gave me glm.fit: fitted probabilities numerically 0 or 1 occurred. I test was working fine, lets see.

@jarioksa
Copy link
Collaborator

@stephanJG the message comes from stats::glm which is an external function not part of Hmsc. The message is usually harmless, and it occurs typically in Poisson and Binomial model when some factor levels have only zeros: the maximum likelihood estimate of coefficient for such a case is minus infinity which cannot be reached in finite computer, but instead you get low negative values.

@cmontalvo-mancheno
Copy link

Hey guys! When I fitted a presence-absence model with spatial random effects without setting the initPar argument to “fixed effects”, my model ran without any problem (See attached Screenshot_1).
Screenshot_1

Yet, when I ran the same model with the initPar argument set to “fixed effects”, I am having the same issue as @stephanJG (see attached Screenshot_2).
Screenshot_2

The traceback() function show the following (see attached Screenshot_3)
Screenshot_3

There are no either NAs nor Inf / -Inf values in the presence-absence matrix. I attempted the quick fix suggested by @FabianRoger to a similar issue. But this produced the same error. Could the issue I am having be related to @jarioksa 's above explanation for the warning message gave by the glm.fit() function? What else should I check? Could you please help me figuring out this issue?

@jarioksa
Copy link
Collaborator

jarioksa commented Sep 21, 2021

@cmontalvo-mancheno This should have no connection to initPar = "fixed effects". There are scattered records on problems with matrix inversions when the input data produces degenerate matrices. There are also several functions and among them several places where this error may crop out. Now we have no idea where this happens. It could help if you run sampleMcmc without parallel processing (that is, with nParallel = 1): then traceback() would show the failing function. However, usually these errors cannot be reproduced without offending data.

@cmontalvo-mancheno
Copy link

@jarioksa Thank you very much for your response and suggestion! I tried what you suggested with all my data (n = 688), and this is what I got after tracing back the error message (please see screenshot_A1)
Screenshot_A1

I also tried a couple of things: 1) removing at random one observation so my data have 687 rather than 688 observations, 2) replacing one observation by another so my data still have 688, and 3) selecting at random 200 observations. Interestingly, in all three cases, the sampleMcmc ran without any error message. For the first case there were no warning message either (please see screenshot_A2).
Screenshot_A2

Yet, for the last two cases there were the following warning messages (screenshot_A3 is for the 2nd case): glm.fit: fitted probabilities numerically 0 or 1 occurred and glm.fit: algorithm did not converge
Screenshot_A3

I do not understand why sometimes I get an error, and sometimes I only get either one or both warning messages. Also, I wonder if these warning messages will have any implication in my models?

@Phoebe-AA
Copy link

Hi Everyone! Nice to see I'm not the only one having this problem...
I have 2 datasets (Abundance and Biomass). I am able to run the abundance dataset using the code below however the biomass dataset has the error code:

Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: the leading minor of order 9 is not positive definite

traceback()
5: stop(count, " nodes produced errors; first error: ", firstmsg,
domain = NA)
4: checkForRemoteErrors(val)
3: dynamicClusterApply(cl, fun, length(x), argfun)
2: clusterApplyLB(cl, 1:nChains, fun = sampleChain)
1: sampleMcmc(model, thin = thin, samples = samples, transient = transient,
nChains = nChains, nParallel = nChains, verbose = verbose,
alignPost = TRUE)


The abundance dataset has 17 stations and biomass has 8 stations. Both work when I add a spatial random level (XY coords) but 1) I'm not interested in spatial aspects as such and 2) with the spatial random effect it makes the model more difficult to converge and therefore more time consuming (minimum 1-2 days for running the model, not including cross-validation). Anyway, here is the code I am using:

studyDesign <- data.frame(sample= as.factor(rownames(Env)), habitat = as.factor(Env$Habitat))
rL1 <- HmscRandomLevel(units = studyDesign$sample)
rL2 <- HmscRandomLevel(units = studyDesign$habitat)
XFormula <- ~ poly(Depth, degree = 2, raw = TRUE) +
poly(Oxygen, degree = 2, raw = TRUE) +
poly(B.Temp, degree = 2, raw = TRUE) +
poly(Flourescence, degree = 2, raw = TRUE) +
poly(B.PSU, degree = 2, raw = TRUE) +
poly(Turbidity, degree = 2, raw = TRUE) + SIC

TrFormula<- ~S1+ S2+ S3+ S4+ S5+ BF1+ BF2+ BF3+ BF4+ BF5+SK1+ SK2 +SK3+SK4+SK5+
R1+R2+ R3+ R4+ LD1+ LD2+ LD3+ MV1 +MV2+ MV3+ MV4 +MO1 + MO2 + MO3 + MO4 +
FH1 + FH2 + FH3 + FH4 + FH5+FH6+ Z1+ Z2+ Z3+ Z4

model <- Hmsc(Y=logY,YScale = TRUE, XData = XData, XFormula = XFormula,
TrData = TrData, TrFormula = TrFormula,
phyloTree = my.tree,
studyDesign = studyDesign,
ranLevels = list(sample = rL1, habitat = rL2), distr = "normal")

set MCMC parameters

nChains = 4
thin = 15
samples = 200
transient = 1000
verbose = 100
model = sampleMcmc(model, thin = thin, samples = samples,
transient = transient,
nChains = nChains, nParallel = nChains,
verbose = verbose, alignPost = TRUE)

Habitat as a random factor works but it's specifically when I put sample = rL that I get the above error. Any ideas on how to solve this? I can share the code if needs be.

Many thanks,

Phoebe

@ovaskain
Copy link
Collaborator

ovaskain commented Oct 2, 2021 via email

@jarioksa
Copy link
Collaborator

jarioksa commented Oct 2, 2021

Otso gave out the main point. We have already added some code that check the input to avoid clear problem cases such as duplicated spatial locations. Probably we need to add some more checks, but it is hard to know what else should be checked and how.

We call chol 59 times in Hmsc and this makes tracing errors even harder: first we should learn which of these chol commands (for Cholesky decomposition) triggers the error, and then analyse how we get to that error state. If you run sampleMcmc in parallel, we have no clue which chol it was. You should re-run the code with nParallel = 1 which allows traceback() to go to the actual chol command. For instance, with chol(iU) the primary suspect is updateBetaLambda and we know we should inspect how matrix iU is constructed there and see how it could go wrong. As Otso wrote, this may happen at step 100,000 which makes things harder for debugging. We'll see what we can do, though.

@cmontalvo-mancheno
Copy link

Thanks @ovaskain and @jarioksa for your messages. In my three cases, the presence-absence spatial model has no duplicated locations, and their distance range from ~0.85 km to ~385 km. Each of the locations corresponds to the centroid of a 1 km by 1 km grid cell, which I used to record the presence-absence of eight virtual species. The coordinates are in longitude and latitude, so when I defined the spatial random effect, I set longlat = TRUE.

Replacing one location allows me to run sampleMcmc with initPar = "fixed effects" without error message. Yet, there are two warning messages (i.e., glm.fit: fitted probabilities numerically 0 or 1 occurred, and glm.fit: algorithm did not converge). Based on jarioksa's message to stephanJG the first warning is usually harmless.

  1. How about the second warning message?
  2. Will the warning messages by itself or together have any implication in my model?

@Phoebe-AA
Copy link

@ovaskain and @jarioksa thanks for your messages. This is clear about the error message. Also, I had an idea that the models might be over parameterised as I was also getting very low predictive power (whether it was 2-fold or loo validation). Ok, I will start with losing some traits and covariates. Many thanks! Phoebe

@aminorberg
Copy link

Hi @jarioksa!

I've also been running into these chol()-related errors quite a bit lately, and at least in my case they happen inside updateBetaLambda(), when I include covariate-dependent latent variables:

(with phylogeny:)
5: chol.default(kronecker(XEtaTXEta, diag(iSigma)) + P)
4: chol(kronecker(XEtaTXEta, diag(iSigma)) + P)
3: updateBetaLambda(Y = Y, Z = Z, Gamma = Gamma, iV = iV, iSigma = iSigma,
Eta = Eta, Psi = Psi, Delta = Delta, iQ = iQg[, , rho], X = X,
Tr = Tr, Pi = Pi, dfPi = dfPi, C = C, rL = hM$rL)

(without phylogeny:)
5: chol.default(iU)
4: chol(iU)
3: updateBetaLambda(Y = Y, Z = Z, Gamma = Gamma, iV = iV, iSigma = iSigma,
Eta = Eta, Psi = Psi, Delta = Delta, iQ = iQg[, , rho], X = X,
Tr = Tr, Pi = Pi, dfPi = dfPi, C = C, rL = hM$rL)

Best regards,

Anna

@Phoebe-AA
Copy link

Hello again,
I managed to get the model running by adding 'levels' in the code.
rL <- HmscRandomLevel(units = levels(studyDesign$sample)).
I also removed some traits and covariates and the model is working ok.
But now I cannot use the cross-validation as it produces a similar error:

Cross-validation, fold 1 out of 2
setting updater$Gamma2=FALSE due to specified phylogeny matrix
Computing chain 1
Error in chol.default(XtX) :
the leading minor of order 11 is not positive definite

traceback()
8: chol.default(XtX)
7: chol(XtX)
6: chol2inv(chol(XtX))
5: kronecker(H, chol2inv(chol(XtX)))
4: updateGammaEta(Z = Z, Gamma = Gamma, V = chol2inv(chol(iV)),
iV = iV, id = iSigma, Eta = Eta, Lambda = Lambda, Alpha = Alpha,
X = X, Pi = Pi, dfPi = dfPi, Tr = Tr, rL = hM$rL, rLPar = rLPar,
Q = Qg[, , rho], iQ = iQg[, , rho], RQ = RQg[, , rho], mGamma = mGamma,
U = hM$UGamma, iU = iUGamma)
3: sampleChain(chain)
2: sampleMcmc(hM1, samples = hM$samples, thin = hM$thin, transient = hM$transient,
adaptNf = hM$adaptNf, initPar = initPar, nChains = nChains,
nParallel = nParallel, updater = updater, verbose = verbose,
alignPost = alignPost)
1: computePredictedValues(model, partition = partition)

This happens as soon as I run the code, so I'm guessing it must be something in my model structure, or because there are negative values in my model?
Many thanks again,
Phoebe

@jarioksa
Copy link
Collaborator

See issue #123 for some ideas to recover from errors – or to avoid errors.

@Phoebe-AA
Copy link

See issue #123 for some ideas to recover from errors – or to avoid errors.

Ok will do. Thank you, Phoebe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

8 participants