[BUGZILLA #16166] inconsistent results of zapsmall called by Thin.row (especially in anova.mlm) #5610

MichaelChirico · 2020-05-18T12:13:43Z

Let us consider the Dalgaard's example given in utils::example( SSD). Clearly such a comparison:

anova( mlmfit, mlmfit0, idata=D.idata, X∼ deg * noise, M∼ deg * noise)

cannot be performed because X and M span the same space and, consequently, T is void. Nevertheless, although the writing is equivalent, this comparison

anova( mlmfit, mlmfit0, idata=D.idata, X∼ deg * noise)

gives the following wrong result:

Analysis of Variance Table

Model 1: reacttime∼ 1
Model 2: reacttime∼ 1 - 1

Contrasts orthogonal to
~deg * noise

Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F)
1 9 1.2185e-29
2 10 1 2.4231e-29 0.99141 76.902 6 4 0.0004381 ***
---
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

In the first case, identical( proj( X), proj( M)) is TRUE whereas in the second analysis, this assertion is FALSE. Consequently, if in the first case, the rounding done by zapsmall leads to 0, zapsmall returns some not null (very low) values in the second case. Then Thin.row does not return a NULL matrix.

METADATA

Bug author - florent aubry
Creation time - 2015-01-22 11:10:15 UTC
Bugzilla link
Status - NEW
Alias - None
Component - Analyses
Version - R 3.1.2
Hardware - x86_64/x64/amd64 (64-bit) Windows 64-bit
Importance - P5 major
Assignee - R-core
URL -
Modification time - 2020-02-11 16:56 UTC

MichaelChirico · 2020-05-18T12:13:44Z

Please simplify your example. If this is a problem with zapsmall, you shouldn't need to involve examples, anova, etc. Just put together an input to zapsmall that gives the wrong answer.

METADATA

Comment author - Duncan Murdoch
Timestamp - 2015-01-24 12:59:53 UTC

MichaelChirico · 2020-05-18T12:13:45Z

(In reply to Duncan Murdoch from comment #1)

Please simplify your example.  If this is a problem with zapsmall, you
shouldn't need to involve examples, anova, etc.  Just put together an input
to zapsmall that gives the wrong answer.

The purpose ot the example was to emphasize the inconsistent call of zapsmall in stats:::Thin.row using an example for which the maximum of the values to be rounded (i.e., X) is lower than the tolerance of the error (tol).

METADATA

Comment author - florent aubry
Timestamp - 2015-01-26 15:28:21 UTC

MichaelChirico · 2020-05-18T12:13:46Z

Is 'D.idata' a typo in the original example? Does it perhaps refer to the 'idata' created by:

example(anova.mlm)

METADATA

Comment author - Benjamin Tyner
Timestamp - 2020-02-08 16:51:17 UTC

MichaelChirico · 2020-05-18T12:13:47Z

I agree that this could be better described, but the crux of the matter seems to be that the logic of stats:::Thin.row() (and likely proj() and Rank() too) is somewhat brittle. This is alleviated by zapsmall() but that is relative to the largest value, so iv everything is near-zero, it doesn't help, i.e.

zapsmall(1:5+1e-8)

[1] 1 2 3 4 5

but

zapsmall((1:5+1e-8)*1e-9)

[1] 1e-09 2e-09 3e-09 4e-09 5e-09

In the anova.mlm example, we have

X <- model.matrix(~deg*noise,data=idata)
stats:::proj.matrix(X)

         1 2             3             4 5 6

1 1.000000e+00 0 -1.017704e-16 -1.387779e-16 0 0
2 0.000000e+00 1 -1.850372e-17 -2.465190e-32 0 0
3 0.000000e+00 0 1.000000e+00 1.387779e-16 0 0
4 0.000000e+00 0 1.295260e-16 1.000000e+00 0 0
5 0.000000e+00 0 -3.700743e-17 0.000000e+00 1 0
6 2.220446e-16 0 0.000000e+00 0.000000e+00 0 1

which is the identity matrix except for FP errors. However,

zapsmall(diag(6)-stats:::proj.matrix(X))

          1 2             3             4 5 6

1 0.000000e+00 0 1.017704e-16 1.387779e-16 0 0
2 0.000000e+00 0 1.850372e-17 0.000000e+00 0 0
3 0.000000e+00 0 -2.220446e-16 -1.387779e-16 0 0
4 0.000000e+00 0 -1.295260e-16 1.110223e-16 0 0
5 0.000000e+00 0 3.700743e-17 0.000000e+00 0 0
6 -2.220446e-16 0 0.000000e+00 0.000000e+00 0 0

only changes the value -2.46e-32 because the other non-zero values are not small compared to 2.2e-16.

I would conjecture that this might be fixed by extending the check

       if (Rank(cbind(M, X)) != Rank(M)) 
            stop("X does not define a subspace of M")

with a check that Rank(X) < Rank(M). That should remove the possibility that proj(M) - proj(X) is effectively all-zeros. Should also improve the error message from the rather opaque

Error in abs(sapply(deltassd, function(X) diag((T %% X %% t(T))))) :
non-numeric argument to mathematical function

METADATA

Comment author - Peter Dalgaard
Timestamp - 2020-02-11 16:56:52 UTC

MichaelChirico added Component - Analyses Hardware - x86_64/x64/amd64 (64-bit) Windows 64-bi Importance - P5 major Status - NEW Version - R 3.1.2 labels May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUGZILLA #16166] inconsistent results of zapsmall called by Thin.row (especially in anova.mlm) #5610

[BUGZILLA #16166] inconsistent results of zapsmall called by Thin.row (especially in anova.mlm) #5610

MichaelChirico commented May 18, 2020

MichaelChirico commented May 18, 2020

MichaelChirico commented May 18, 2020

MichaelChirico commented May 18, 2020

MichaelChirico commented May 18, 2020

[BUGZILLA #16166] inconsistent results of zapsmall called by Thin.row (especially in anova.mlm) #5610

[BUGZILLA #16166] inconsistent results of zapsmall called by Thin.row (especially in anova.mlm) #5610

Comments

MichaelChirico commented May 18, 2020

METADATA

MichaelChirico commented May 18, 2020

METADATA

MichaelChirico commented May 18, 2020

METADATA

MichaelChirico commented May 18, 2020

METADATA

MichaelChirico commented May 18, 2020

METADATA