Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGZILLA #16070] terms function mishandling case when intercept is excluded #5527

Open
MichaelChirico opened this issue May 18, 2020 · 10 comments

Comments

@MichaelChirico
Copy link
Owner

Hi,

R documentation describes the factors attribute of the terms.object as follows:

A matrix of variables by terms showing which variables appear in which terms.
The entries are 0 if the variable does not occur in the term, 1 if it does occur
and should be coded by contrasts, and 2 if it occurs and should be coded via
dummy variables for all levels (as when an intercept or lower-order term is
missing). If there are no terms other than an intercept and offsets, this is
numeric(0).
(http://stat.ethz.ch/R-manual/R-patched/library/stats/html/terms.object.html)

In the example below, I would expect Species to have a value of 2 since the
intercept is omitted. Indeed, when using model.matrix it is clear that Species
has been coded with dummy variables for all three levels.

f <-∼ -1 + Species

attr(terms(f, data=iris), "factors")
# Species
#Species 1

levels(iris$Species)
#[1] "setosa" "versicolor" "virginica"

colnames(model.matrix(f, iris))
#[1] "Speciessetosa" "Speciesversicolor" "Speciesvirginica"

I think this is a bug in the terms function.

Many thanks in advance,

Pat O'Reilly


METADATA

  • Bug author - Pat O'Reilly
  • Creation time - 2014-11-11 16:46:50 UTC
  • Bugzilla link
  • Status - REOPENED
  • Alias - None
  • Component - Models
  • Version - R 3.1.1
  • Hardware - Other Other
  • Importance - P5 major
  • Assignee - R-core
  • URL -
  • Modification time - 2020-02-09 20:46 UTC
@MichaelChirico
Copy link
Owner Author

Do you know of any model fits where the wrong terms are used because of this? Maybe it's just a documentation error.


METADATA

  • Comment author - Duncan Murdoch
  • Timestamp - 2014-11-13 15:07:41 UTC

@MichaelChirico
Copy link
Owner Author

As far as I can tell, the documentation has the right intention. However, I don't think the attribute is ever used except in interaction terms --- for main effects, one just looks at whether the model contains an intercept. I haven't actually checked the code, though.

Assuming that I am right, matching the documentation to the code is probably possible ("2 if it occurs in an interaction term and...") but it might be clearer to fix the code.


METADATA

  • Comment author - Peter Dalgaard
  • Timestamp - 2014-11-13 16:05:06 UTC

@MichaelChirico
Copy link
Owner Author

I've not seen this affect the fitting of models but I intend to use the terms object as a source of metadata for a model matrix. If the factors attribute is correct, one can easily work from left to right through the factors matrix to determine what each column of the associated model matrix means. For example column 1 being Age interacted with Gender = "Male". If the factor matrix is incorrect then this becomes challenging.


METADATA

  • Comment author - Pat O'Reilly
  • Timestamp - 2014-11-13 17:18:36 UTC

@MichaelChirico
Copy link
Owner Author

The calculation for this is in function TermCode in model.c in the stats package.
It says that the calculation follows the heuristic on p. 38 of the white book,
but on checking, I see that it doesn't -- the case of no intercept is described there and our code doesn't do what it says. I'll fix it.


METADATA

  • Comment author - Duncan Murdoch
  • Timestamp - 2014-11-14 18:24:17 UTC

@MichaelChirico
Copy link
Owner Author

Now fixed in R-devel, soon in R-patched.


METADATA

  • Comment author - Duncan Murdoch
  • Timestamp - 2014-11-14 20:32:39 UTC

@MichaelChirico
Copy link
Owner Author

This change caused several packages to fail their tests, and has been backed out. We're discussing what to do next.


METADATA

  • Comment author - Duncan Murdoch
  • Timestamp - 2014-11-16 13:35:25 UTC

@MichaelChirico
Copy link
Owner Author

Duncan,

I tried your fix and I think it is incorrect.

f <-&sim; -1 + Petal.Length*Species

attr(terms(f, data=iris), "factors")
&num;             Petal.Length Species Petal.Length:Species
#Petal.Length            2       0                    1
#Species                 0       1                    1

Since Species is the first factor, this needs to be coded with dummy variables.

The modelmatrix function works because although it uses a terms object to build the design matrix, it fixes the factors pattern matrix first.

Line 451 in model.c:

/* If there is no intercept we look through the factor pattern */
/* matrix and adjust the code for the first factor found so that */
/* it will be coded by dummy variables rather than contrasts. */

if (!intrcept) {
    for (j = 0; j < nterms; j++) {
        for (i = risponse; i < nVar; i++) {
            if (INTEGER(nlevs)[i] > 1
            && INTEGER(factors)[i + j * nVar] > 0) {
                INTEGER(factors)[i + j * nVar] = 2;
                goto alldone;
            }
        }
    }
}

METADATA

  • Comment author - Pat O'Reilly
  • Timestamp - 2014-11-21 13:37:14 UTC

@MichaelChirico
Copy link
Owner Author

Pat: What change did you try? Currently there are no changes to the code in R-devel or R-patched, as noted in comment 6. R-devel currently has updated documentation to explain the current behaviour, but the behaviour hasn't changed from earlier code.


METADATA

  • Comment author - Duncan Murdoch
  • Timestamp - 2014-11-21 19:44:20 UTC

@MichaelChirico
Copy link
Owner Author

Duncan, I found your fix on github:

https://github.com/wch/r-source/blob/5722f120a941741a710ed965d90ddb41d03db9d2/src/library/stats/src/model.c

Would it be a correct assessment to say that the no-intercept adjustment currently in the modelmatrix function should really be in either termcode or termsform?


METADATA

  • Comment author - Pat O'Reilly
  • Timestamp - 2014-11-21 22:05:22 UTC

@MichaelChirico
Copy link
Owner Author

Did work on this ever get moved forward?


METADATA

  • Comment author - elin.waring
  • Timestamp - 2020-02-09 20:46:00 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant