Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy output for lmer is not very useful #96

Closed
hadley opened this issue Dec 1, 2015 · 18 comments
Closed

Tidy output for lmer is not very useful #96

hadley opened this issue Dec 1, 2015 · 18 comments

Comments

@hadley
Copy link
Contributor

hadley commented Dec 1, 2015

library(lme4)
library(broom)

names(sleepstudy) <- tolower(names(sleepstudy))

fm1 <- lmer(reaction ~ days + (days | subject), sleepstudy)

tidy(fm1)

I'd rather see something like coef(fm1) - i.e. the individual estimates of the random effects for each group

@tjmahr
Copy link

tjmahr commented Dec 1, 2015

The change in 4.0 looks like an API change that should be documented in the release notes because it will break old code.

Here are my notes on what's happened...

Old behavior:

> tidy(fm1)
#      group level        term    estimate
# 1  subject   308 (Intercept) 253.6636702
# 2  subject   309 (Intercept) 211.0065280
# 3  subject   310 (Intercept) 212.4448591
# 4  subject   330 (Intercept) 275.0956033
# 5  subject   331 (Intercept) 273.6653076
# ...

Where it was actually calling tidy(fm1, "random") to give random effects by default. tidy(fm1, "random") now raises an error.

The new default is random effects variances and fixed effects estimates:

tidy(fm1)
#                           term     estimate std.error statistic    group
# 1                  (Intercept) 251.40510485  6.824556 36.838311    fixed
# 2                         days  10.46728596  1.545789  6.771485    fixed
# 3       sd_(Intercept).subject  24.74044759        NA        NA  subject
# 4              sd_days.subject   5.92213327        NA        NA  subject
# 5 cor_(Intercept).days.subject   0.06555133        NA        NA  subject
# 6      sd_Observation.Residual  25.59181589        NA        NA Residual

To replicate the old default, we have to call

tidy(fm1, "ran_modes")
#    level   group        term    estimate std.error
# 1    308 subject (Intercept) 253.6636702 12.070837
# 2    308 subject        days  19.6662579  2.304837
# 3    309 subject (Intercept) 211.0065280 12.070837
# 4    309 subject        days   1.8475828  2.304837
# 5    310 subject (Intercept) 212.4448591 12.070837
# ...

So the release notes should say something like:

### Changed

The tidy function for lme4 models has been updated and its default behavior has changed. 

* Random effect variances and covariances can be now extracted. 
  `tidy(model, effects = "ran_pars")` returns the variances/covariances of 
  random effects. The output of random effects parameters is controlled by 
  the `scales` argument. By default, `scales = "sdcar"` is used to return 
  standard deviations and correlations. Use `scales = "varcov"` to get 
  variances and covariances.
* Random effects estimates are now extracted with 
  `tidy(model, effects = "ran_modes")`. Previously these effects were 
  extracted with `tidy(model, "random")`.
* Multiple effects can be selected with the `effects` argument. 
* The default behavior of `tidy(model)` is 
  `tidy(model, effects = c("ran_pars", "fixed")` which extracts random 
  effect variances/covariances and fixed effect estimates. Previously,
  random effect estimates were given.

@bbolker
Copy link
Contributor

bbolker commented Apr 17, 2016

I agree that this is an API break (it's my fault). Some comments:

  • There may be an inaccuracy in your notes. We need to distinguish between the "conditional modes" (predicted deviations of each group from the overall mean, returned by ranef.merMod()) and the predicted values for each group (=overall mean + predicted deviation, returned by coef.merMod() [we could also call these something like "level-1 predictions", although that's not quite right/general). I think the old version returned the latter, I don't know without double-checking whether there's a way to restore the old behaviour/get the (which there ought to be!)
  • the defaults could be changed around if people think that the old behaviour is a more useful default (I don't think so ... use cases anyone?)
  • I think this model should be extended to tidy.lme, which is still using the old format.

Hadley, can you give a use case? Your request seems like more the output of predict() ... which goes more with augment than tidy, as I see it ...

@skanskan
Copy link

skanskan commented May 8, 2016

tidy(mymodel, "ran_modes") doesn't work.

Error in $<-.data.frame(*tmp*, "level", value = c("4", "6", "8")) :
replacement has 3 rows, data has 1

@bbolker
Copy link
Contributor

bbolker commented May 8, 2016

@skanskan, I don't doubt this, but do you happen to have a reproducible example? Works in my fork, but I'm not 100% sure that it's not divergent from the current master branch in some way ... For me,

library(lme4)
library(broom)
example(lmer)
tidy(fm1,"ran_modes")

works fine.

(Are you using the CRAN version or the latest Github version?)

@bbolker
Copy link
Contributor

bbolker commented May 8, 2016

I'm going to continue this conversation here, I think there are few things that need pointing out:

  • the way things are implemented right now is (bluntly) a little bit of a mess. effects="ran_modes" doesn't in fact return the conditional modes; it returns the coefficients, as done e.g. by the lme tidiers, but it plugs in the standard errors of the conditional modes as the standard errors, which is incorrect (the uncertainty of the fixed effects should be incorporated as well; how this should be done is a currently unresolved can of worms. There is a long thread on the r-sig-mixed-models archives about the difficulties of doing this. I would favor distinguishing between effects="coef" (which would return coefficients, as in coef.merMod, with or without standard errors) and effects=ran_modes`".
  • getting a bit meta, there's a general question about what broom should offer: should it try to solve problems that are left unsolved by package authors/go beyond what package authors have implemented, for the sake of convenience at the possible price of approximation/incorrectness?
  • I'm really interested in @hadley's use case for tidy(merMod_object); what are you usually doing with the coefficients?
  • is there consensus on extending the results of this discussion (whatever it is) to the lme tidier as well?

@bbolker
Copy link
Contributor

bbolker commented May 19, 2016

Bump. Any further comments?

@skanskan
Copy link

It would be great if tidy(model, "fixed") also included the column "Pr(>|t|)" added by lmerTest.
Thanks

@bbolker
Copy link
Contributor

bbolker commented May 19, 2016

@skanskan, you already made that comment at #126 ...

@bbolker
Copy link
Contributor

bbolker commented Jun 25, 2016

bump. I think I'm talking to myself a bit here: I would really like to move this forward a bit, and I have moderately strong opinions about what the "right way" is (i.e. the default is to return the fixed-effect coefficients and the variance-covariance parameters, appropriately identified; the old behavior, which returns the output of coef(fitted_model)/gives the predicted values of the coefficients for each group, can be accessed. I'd really like to fix up tidy.lme at the same time. If it would be OK, I can create a version that does what I think it should and @dgrtwo can adjust it to behave as he sees fit ... ?

@bbolker
Copy link
Contributor

bbolker commented Jul 12, 2016

bump ... ?

@dgrtwo
Copy link
Collaborator

dgrtwo commented Jul 12, 2016

Hi Ben: thank you so much for your interest, involvement, and commitment.

The truth is that I know so little about linear mixed effects models that I will not be of any help here, so I'm very open to trusting your opinions.

Since you're the maintainer of lme4, have you considered implementing the tidy/augment/glance methods in lme4, after importing and re-exporting the generics from broom? This is a pattern I've recently started encouraging among other maintainers who are familiar with broom, and it would remove me as an obstacle.

If you would rather this live in broom, please do submit this as a pull request and I promise to be more punctual about taking a look and merging it in.

@bbolker
Copy link
Contributor

bbolker commented Jul 12, 2016

Hmm. Thanks for the ideas. (I was hoping to get feedback from someone else who uses mixed models - maybe I'll advertise this issue at r-sig-mixed-models@r-project.org ...) I certainly could put the merMod methods into lme4. Alternatively, I wonder whether it might make sense to break the mixed modeling stuff out into a separate package (as discussed at #146); this could include methods for lme4, nlme, MCMCglmm, brms, rstanarm, glmmADMB, glmmTMB ... packages ...)

@vathymut
Copy link

vathymut commented Jul 20, 2017

@bbolker @skanskan It seems lmer works but glmer breaks. See below for quick reproducible example. My cumbersome workaround so far is to reconstruct the tidy dataset via ranef( gm1, condVar = TRUE) and then inspect/extract from attr( ., "postVar")

library(lme4)
library(broom)
example(glmer)
tidy(gm1,"ran_modes")

Error in $<-.data.frame(*tmp*, "level", value = c("1", "2", "3", "4", :
replacement has 15 rows, data has 1

@bbolker
Copy link
Contributor

bbolker commented Jul 21, 2017

@vathymut, I've fixed this problem in my fork. However, see my comment above: "ran_modes" returns the coefficients (i.e. fixed + random), not what I would call the conditional modes (random only) themselves.

You may also be interested to know that the development version of lme4 now includes an as.data.frame.ranef.mer() method that does the work of extracting the standard deviations of the conditional modes ...

@vathymut
Copy link

@bbolker Thank you. as.data.frame.ranef.mer() does what I need. And lme4 is one of favorite packages.

@bbolker
Copy link
Contributor

bbolker commented Dec 31, 2017

I think this should be closed. broom.mixed is gradually getting its act together. I think the issues raised in this thread are on their way to being resolved; can we continue the discussion at bbolker/broom.mixed#7 please?

@hadley, I'm particularly interested in your opinions, as the originator of the issue ...

@alexpghayes
Copy link
Collaborator

Closing in favor of bbolker/broom.mixed#7.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants