Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGZILLA #16166] inconsistent results of zapsmall called by Thin.row (especially in anova.mlm) #5610

Open
MichaelChirico opened this issue May 18, 2020 · 4 comments

Comments

@MichaelChirico
Copy link
Owner

Let us consider the Dalgaard's example given in utils::example( SSD). Clearly such a comparison:

anova( mlmfit, mlmfit0, idata=D.idata, X∼ deg * noise, M∼ deg * noise)

cannot be performed because X and M span the same space and, consequently, T is void. Nevertheless, although the writing is equivalent, this comparison

anova( mlmfit, mlmfit0, idata=D.idata, X∼ deg * noise)

gives the following wrong result:

Analysis of Variance Table

Model 1: reacttime∼ 1
Model 2: reacttime∼ 1 - 1

Contrasts orthogonal to
~deg * noise

Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F)
1 9 1.2185e-29
2 10 1 2.4231e-29 0.99141 76.902 6 4 0.0004381 ***
---
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

In the first case, identical( proj( X), proj( M)) is TRUE whereas in the second analysis, this assertion is FALSE. Consequently, if in the first case, the rounding done by zapsmall leads to 0, zapsmall returns some not null (very low) values in the second case. Then Thin.row does not return a NULL matrix.


METADATA

  • Bug author - florent aubry
  • Creation time - 2015-01-22 11:10:15 UTC
  • Bugzilla link
  • Status - NEW
  • Alias - None
  • Component - Analyses
  • Version - R 3.1.2
  • Hardware - x86_64/x64/amd64 (64-bit) Windows 64-bit
  • Importance - P5 major
  • Assignee - R-core
  • URL -
  • Modification time - 2020-02-11 16:56 UTC
@MichaelChirico
Copy link
Owner Author

Please simplify your example. If this is a problem with zapsmall, you shouldn't need to involve examples, anova, etc. Just put together an input to zapsmall that gives the wrong answer.


METADATA

  • Comment author - Duncan Murdoch
  • Timestamp - 2015-01-24 12:59:53 UTC

@MichaelChirico
Copy link
Owner Author

(In reply to Duncan Murdoch from comment #1)

Please simplify your example.  If this is a problem with zapsmall, you
shouldn't need to involve examples, anova, etc.  Just put together an input
to zapsmall that gives the wrong answer.

The purpose ot the example was to emphasize the inconsistent call of zapsmall in stats:::Thin.row using an example for which the maximum of the values to be rounded (i.e., X) is lower than the tolerance of the error (tol).


METADATA

  • Comment author - florent aubry
  • Timestamp - 2015-01-26 15:28:21 UTC

@MichaelChirico
Copy link
Owner Author

Is 'D.idata' a typo in the original example? Does it perhaps refer to the 'idata' created by:

example(anova.mlm)

METADATA

  • Comment author - Benjamin Tyner
  • Timestamp - 2020-02-08 16:51:17 UTC

@MichaelChirico
Copy link
Owner Author

I agree that this could be better described, but the crux of the matter seems to be that the logic of stats:::Thin.row() (and likely proj() and Rank() too) is somewhat brittle. This is alleviated by zapsmall() but that is relative to the largest value, so iv everything is near-zero, it doesn't help, i.e.

zapsmall(1:5+1e-8)

[1] 1 2 3 4 5

but

zapsmall((1:5+1e-8)*1e-9)

[1] 1e-09 2e-09 3e-09 4e-09 5e-09

In the anova.mlm example, we have

X <- model.matrix(~deg*noise,data=idata)
stats:::proj.matrix(X)
         1 2             3             4 5 6

1 1.000000e+00 0 -1.017704e-16 -1.387779e-16 0 0
2 0.000000e+00 1 -1.850372e-17 -2.465190e-32 0 0
3 0.000000e+00 0 1.000000e+00 1.387779e-16 0 0
4 0.000000e+00 0 1.295260e-16 1.000000e+00 0 0
5 0.000000e+00 0 -3.700743e-17 0.000000e+00 1 0
6 2.220446e-16 0 0.000000e+00 0.000000e+00 0 1

which is the identity matrix except for FP errors. However,

zapsmall(diag(6)-stats:::proj.matrix(X))
          1 2             3             4 5 6

1 0.000000e+00 0 1.017704e-16 1.387779e-16 0 0
2 0.000000e+00 0 1.850372e-17 0.000000e+00 0 0
3 0.000000e+00 0 -2.220446e-16 -1.387779e-16 0 0
4 0.000000e+00 0 -1.295260e-16 1.110223e-16 0 0
5 0.000000e+00 0 3.700743e-17 0.000000e+00 0 0
6 -2.220446e-16 0 0.000000e+00 0.000000e+00 0 0

only changes the value -2.46e-32 because the other non-zero values are not small compared to 2.2e-16.

I would conjecture that this might be fixed by extending the check

       if (Rank(cbind(M, X)) != Rank(M)) 
            stop("X does not define a subspace of M")

with a check that Rank(X) < Rank(M). That should remove the possibility that proj(M) - proj(X) is effectively all-zeros. Should also improve the error message from the rather opaque

Error in abs(sapply(deltassd, function(X) diag((T %% X %% t(T))))) :
non-numeric argument to mathematical function


METADATA

  • Comment author - Peter Dalgaard
  • Timestamp - 2020-02-11 16:56:52 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant