-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGZILLA #17475] lm.wfit issue with small weights #6649
Comments
So this is about lm.wft, not lme.wfit? In any case please run: dput(df) and paste in the result, to make it easier for folks to reproduce. Bonus points if you do it under a more recent version of R. METADATA
|
For the df pleae use the structure below: dput(df) METADATA
|
(In reply to Benjamin Tyner from comment #1)
I fixed the title thanks. METADATA
|
Thank you. Arguably, a more serious concern is that the predict() and fitted() methods return divergent results:
[1] -10.02918 In any case, if you care about the accuracy of the residual for an observation with such a small weight, you'll need to compute it yourself e.g. via:
1 1.5115614 0.5520924 2.117337e-34 -7.2436891 0.4868415 METADATA
|
(In reply to Benjamin Tyner from comment #4)
Yes, your point is even more serious. There are some ways of rounding this issue but all need the knowledge of the issue. Other words, we know this bug exists and then we can solve it by alternative methods as you pointed out, however, the other R users may not even know that this issue exists. Then I think that the R core team should fix that on the next release. METADATA
|
I have recently realised that there is major instability in lm.wf for values of weights that are smaller than the machine epsilon. I clarify this issue with an example.
The input data is as below. I should stress that the weights are intentionally designed to reflect some structures in the data
y x weight
1.51156139 0.55209240 2.117337e-34
-0.63653132 -0.12599316 2.117337e-34
0.37782776 0.42095384 4.934135e-31
3.03792318 1.40315446 2.679495e-24
1.53646523 0.46076858 2.679495e-24
-2.37727874 -0.73963576 6.244160e-21
0.37183065 0.20407468 1.455107e-17
-1.53917553 -0.95519361 1.455107e-17
1.10926675 0.03897129 3.390908e-14
-0.37786333 -0.17523593 3.390908e-14
2.43973603 0.97970095 7.902000e-11
-0.35432394 -0.03742559 7.902000e-11
2.19296613 1.00355263 4.289362e-04
0.49845532 0.34816207 4.289362e-04
1.25005260 0.76306225 5.000000e-01
0.84360691 0.45152356 5.000000e-01
0.29565993 0.53880068 5.000000e-01
-0.54081334 -0.28104525 5.000000e-01
0.83612836 -0.12885659 9.995711e-01
-1.42526769 -0.87107631 9.999998e-01
0.10204789 -0.11649899 1.000000e+00
1.14292898 0.37249631 1.000000e+00
-3.02942081 -1.28966997 1.000000e+00
-1.37549764 -0.74676145 1.000000e+00
-2.00118016 -0.55182759 1.000000e+00
-4.24441674 -1.94603608 1.000000e+00
1.17168144 1.00868008 1.000000e+00
2.64007761 1.26333069 1.000000e+00
1.98550114 1.18509599 1.000000e+00
-0.58941683 -0.61972416 9.999998e-01
-4.57559611 -2.30914920 9.995711e-01
-0.82610544 -0.39347576 9.995711e-01
-0.02768220 0.20076910 9.995711e-01
0.78186399 0.25690215 9.995711e-01
-0.88314153 -0.20200148 5.000000e-01
-4.17076452 -2.03547588 5.000000e-01
0.93373070 0.54190626 4.289362e-04
-0.08517734 0.17692491 4.289362e-04
-4.47546619 -2.14876688 4.289362e-04
-1.65509103 -0.76898087 4.289362e-04
-0.39403030 -0.12689705 4.289362e-04
0.01203300 -0.18689898 1.841442e-07
-4.82762639 -2.31391121 1.841442e-07
-0.72658380 -0.39751171 3.397282e-14
-2.35886866 -1.01082109 0.000000e+00
-2.03762707 -0.96439902 0.000000e+00
0.90115123 0.60172286 0.000000e+00
1.55999194 0.83433953 0.000000e+00
3.07994058 1.30942776 0.000000e+00
1.78871462 1.10605530 0.000000e+00
Running simple linear model returns:
Call:
lm(formula = y∼ x, data = df)
Coefficients:
(Intercept) x
-0.04173 2.03790
and
[1] 1.14046
HOWEVER if I use the weighted model then:
lm(formula = y∼ x, data = df, weights = df$weights)
Coefficients:
(Intercept) x
-0.05786 1.96087
and
[1] 60.91888
as you see, the estimation of the coefficients are nearly the same but the resid() function returns a giant residual.
I am aware that the source of instability is small weights and I understand that the issue can be resolved by zeroing the small weights. However, the cuttof must be controlled by the algorithm in lm.wfit(), either by introducing a user-controlled parameter (such as tol) or setting the cutoff internally proportional to the machine epsilon.
METADATA
The text was updated successfully, but these errors were encountered: