Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGZILLA #16665] dummy.coef fails when transformations are included in formula #6052

Open
MichaelChirico opened this issue May 18, 2020 · 2 comments

Comments

@MichaelChirico
Copy link
Owner

Created attachment 1999 [details]
dummy.coef.fix.R

The function dummy.coef.lm fails in more complex cases, notably when terms
include variables that are transformed in the formula of the model.

r.lm <- lm(Fertility∼ cut(Agriculture, breaks=4) + Infant.Mortality,
data=swiss)
dummy.coef(r.lm)

Error in model.frame.default(Terms, dummy, na.action = function(x) x, :
factor cut(Agriculture, breaks = 4) has new level (0.9995,1]

The problem is that ii works with all.vars , which returns untransformed
variables. This is fixed by using model.frame instead -- which is needed
later in the function anyway.

The function dummy.coef.fix does this.

dummy.coef.fix(r.lm)

Thus, dummy.coef.lm should be replaced by dummt.coef.fix .

In the function, there is a warning
warning("some terms will have NAs due to the limits of the method")
I wonder why this is a "limit' (->limitation) of the method.
If some interaction coefficients are undetermined because the respective
combination of levels is not available, NA is the appropriate result.
Are there other cases?

I have extended the function to include confidence intervals and t-tests
and call the extended function allcoef .
The latter are what is shown by summary.lm, except that for the (dumy)
variable that is eliminated by the contrasts . For treatment contrasts,
the added information is trivial (0 with 0 standard error), but for
sum (or weighted sum) contrasts, it is not, and for other contrasts, it may
still recover more useful information.
The function would need some polishing to work in general contexts.
Let me know if you are interested.

Werner Stahel, Jan 4, 2016


METADATA

  • Bug author - Werner A. Stahel
  • Creation time - 2016-01-11 12:32:10 UTC
  • Bugzilla link
  • Status - ASSIGNED
  • Alias - None
  • Component - Analyses
  • Version - R 3.2.3
  • Hardware - Other Linux
  • Importance - P5 enhancement
  • Assignee - R-core
  • URL -
  • Modification time - 2020-02-08 19:44 UTC
@MichaelChirico
Copy link
Owner Author

Thank you, Werner.

I can confirm that your version works for the example where the current stats package one fails.
Your version also fixes the similar problem reported to R-help
"bug in dummy.coef?"
https://stat.ethz.ch/pipermail/r-help/2013-October/362106.html

I've spent a bit of time because your version had quite a few changes that were not necessary (you renamed three of the internal variables) and your version must have come from simple "print()"ing of the function definition in an older version of R, so your code misses the comments from the source code and e.g., the newer anyNA() use.
Note that the most current source (of "R-devel") is always (for this function)
https://svn.r-project.org/R/trunk/src/library/base/R/dummy.coef.R
((but to find this file, you most easly get a source "tarball" from one of the places linked from https://www.r-project.org/sources.html -- note the daily versions provided by "SfS"!) or if you prefer the web, you can use the 'site:svn.r-project.org/R' trick :
https://www.google.ch/search?q=site:svn.r-project.org/R++%27dummy.coef%27&ie=utf-8&oe=utf-8&gws_rd=cr&ei=1t2hVqqGDoXxUt_spugL))

Your question about the warning: I also find it a bit "strange".
One could replace "due to the limits of the method"
by "due to the design" (meaning the linear model design matrix),
but I think you are suggesting that no warning should be given there, right?

I did not easily find a case that triggers the warning. Do you have one?

Best regards,
Martin


METADATA

  • Comment author - Martin Maechler
  • Timestamp - 2016-01-22 07:45:43 UTC

@MichaelChirico
Copy link
Owner Author

Should this be closed? With revision 70020, doc/NEWS.Rd has:

 \item \code{dummy.coef.lm()} now works in more cases, thanks to a
  proposal by Werner Stahel (\PR{16665}).

...or maybe it stays open until resolution of the question about warning("some terms will have NAs due to the limits of the method")?


METADATA

  • Comment author - Benjamin Tyner
  • Timestamp - 2020-02-08 19:44:43 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant