-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGZILLA #16070] terms function mishandling case when intercept is excluded #5527
Comments
Do you know of any model fits where the wrong terms are used because of this? Maybe it's just a documentation error. METADATA
|
As far as I can tell, the documentation has the right intention. However, I don't think the attribute is ever used except in interaction terms --- for main effects, one just looks at whether the model contains an intercept. I haven't actually checked the code, though. Assuming that I am right, matching the documentation to the code is probably possible ("2 if it occurs in an interaction term and...") but it might be clearer to fix the code. METADATA
|
I've not seen this affect the fitting of models but I intend to use the terms object as a source of metadata for a model matrix. If the factors attribute is correct, one can easily work from left to right through the factors matrix to determine what each column of the associated model matrix means. For example column 1 being Age interacted with Gender = "Male". If the factor matrix is incorrect then this becomes challenging. METADATA
|
The calculation for this is in function TermCode in model.c in the stats package. METADATA
|
Now fixed in R-devel, soon in R-patched. METADATA
|
This change caused several packages to fail their tests, and has been backed out. We're discussing what to do next. METADATA
|
Duncan, I tried your fix and I think it is incorrect.
Since Species is the first factor, this needs to be coded with dummy variables. The modelmatrix function works because although it uses a terms object to build the design matrix, it fixes the factors pattern matrix first. Line 451 in model.c:
METADATA
|
Pat: What change did you try? Currently there are no changes to the code in R-devel or R-patched, as noted in comment 6. R-devel currently has updated documentation to explain the current behaviour, but the behaviour hasn't changed from earlier code. METADATA
|
Duncan, I found your fix on github: Would it be a correct assessment to say that the no-intercept adjustment currently in the modelmatrix function should really be in either termcode or termsform? METADATA
|
Did work on this ever get moved forward? METADATA
|
Hi,
R documentation describes the factors attribute of the terms.object as follows:
A matrix of variables by terms showing which variables appear in which terms.
The entries are 0 if the variable does not occur in the term, 1 if it does occur
and should be coded by contrasts, and 2 if it occurs and should be coded via
dummy variables for all levels (as when an intercept or lower-order term is
missing). If there are no terms other than an intercept and offsets, this is
numeric(0).
(http://stat.ethz.ch/R-manual/R-patched/library/stats/html/terms.object.html)
In the example below, I would expect Species to have a value of 2 since the
intercept is omitted. Indeed, when using model.matrix it is clear that Species
has been coded with dummy variables for all three levels.
f <-∼ -1 + Species
attr(terms(f, data=iris), "factors")
# Species
#Species 1
levels(iris$Species)
#[1] "setosa" "versicolor" "virginica"
colnames(model.matrix(f, iris))
#[1] "Speciessetosa" "Speciesversicolor" "Speciesvirginica"
I think this is a bug in the terms function.
Many thanks in advance,
Pat O'Reilly
METADATA
The text was updated successfully, but these errors were encountered: