-
Notifications
You must be signed in to change notification settings - Fork 41
/
random_intercepts.Rmd
609 lines (412 loc) · 31.6 KB
/
random_intercepts.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
```{r chunk_setup-ran-int, include=FALSE, eval=TRUE}
knitr::opts_chunk$set(
# cache
cache.rebuild = FALSE,
cache = TRUE
)
```
# Mixed Models
<br>
```{r top-plot-ri, echo=FALSE, cache.rebuild=TRUE}
tags$div(
style = "width:50%; margin:auto auto; font-size:50%",
DiagrammeR::grViz(
'scripts/multilevel.gv',
width = '100%',
height = '33%'
)
)
```
<br>
Mixed models have been around a long time in the statistical realm. For example, standard ANOVA methods can be seen as special cases of a mixed model. More recently, mixed models have a variety of applications and extensions, allowing them to encompass a diverse range of data situations. They can be seen as a first step in expanding one's tool set beyond the generalized linear model.
## Terminology
For the uninitiated, the terminology surrounding mixed models, especially across disciplines, can be a bit confusing. Some terms you might come across regarding these types of models include:
- Variance components
- Random intercepts and slopes
- Random effects
- Random coefficients
- Varying coefficients
- Intercepts- and/or slopes-as-outcomes
- Hierarchical linear models
- Multilevel models (implies multiple levels of hierarchically clustered data)
- Growth curve models (possibly Latent GCM)
- Mixed effects models
All describe types of mixed models. Some terms might be more historical, others are more often seen in a specific discipline, others might refer to a certain data structure, and still others are special cases. *Mixed effects*, or simply mixed, models generally refer to a mixture of fixed and random effects. For the models in general, I prefer the terms 'mixed models' or 'random effects models' because they are simpler terms, no specific structure is implied, and the latter can also apply to extensions that many would not think of when other terms are used[^richlypar]. Regarding the mixed effects, *fixed effects* is perhaps a poor but nonetheless stubborn term for the typical main effects one would see in a linear regression model, i.e. the non-random part of a mixed model. In some contexts, they are referred to as the *population average* effect. Though you will hear many definitions, *random effects* are simply those specific to an observational unit, however defined. The approach outlined in this document largely pertains to the case where the observational unit is the level of some grouping factor, but this is only one of several possibilities.
## Kinds of Structure
In terms of structure, data might have one or multiple sources of clustering, and that clustering may be hierarchical, such that clusters are nested within other clusters. An example would be scholastic aptitude tests given multiple times to students (repeated observations nested within students, students nested within schools, schools nested within districts). In other cases, there is no nesting structure. An example would be a reaction time experiment where participants perform the same set of tasks. While observations are nested within individual, observations are also clustered according to task type. Some use the terms *nested* and *crossed* to distinguish between these scenarios. In addition, clustering may be balanced or not. We might expect more balance in studies of an experimental nature, but definitely not in other cases, e.g. where the cluster is something like geographical unit and the observations are people.
In what follows we'll see mixed effect models in all these data situations. In general, our approach will be the same, as such clustering is really more a property of the data than the model. However, it's important to get a sense of the flexibility of mixed models to handle a variety of data situations. It's also worth noting that other structure may arise, such as temporal or spatial. We'll note those in some places in this document, but they will not be of focus.
## Random Intercepts Model
For the following we'll demonstrate the simplest[^vcmodel] and most common case of a mixed model, that in which we have a single grouping/cluster structure for the random effect. For reasons that will hopefully become clear soon, this is commonly called a *random intercepts model*.
## Example: Student GPA
For our first example, we'll assess factors predicting college grade point average (GPA). Each of the 200 students is assessed for six occasions (each semester for the first three years), so we have observations clustered within students. We have other variables such as job status, sex, and high school GPA. Some will be in both labeled and numeric form. See the [appendix][Appendix] for more detail.
```{r gpa_setup, echo=FALSE, eval=FALSE}
# MC Note: either the job label is incorrect or this variable makes no sense.
# The label is 0,1:3, 4 or more hours (pt jobs for less than 4 hours? per day?).
# However only values of 1 (rare to non-existent some years) 2 or 3. How the
# hell do you 'simulate' a factor that only has 3 of 5 levels and one category
# that makes up 80% of the data? Avoid or change.
gpa0 = read_spss('data/raw_data/joop_hox_data2/5 Longitudinal/gpa2long.sav') %>%
mutate(highgpa=as.numeric(highgpa),
student = factor(student),
occas = as_factor(occas),
job = as_factor(job),
sex = as_factor(sex),
admitted = as_factor(admitted),
year = as.numeric(str_sub(occas, 6,6)),
semester = as_factor(str_sub(occas, -1,-1)),
occasion = as.numeric(occas)-1) # to get rid of stupid labels
glimpse(gpa0)
gpa = gpa0 %>%
map_if(is.numeric, function(x) {attributes(x) = NULL; x}) %>%
as.data.frame()
glimpse(gpa)
readr::write_csv(gpa, 'data/gpa.csv')
save(gpa, file='data/gpa.RData')
```
```{r show_gpa_data, echo=FALSE, cache.rebuild=F}
load('data/gpa.RData')
DT::datatable(
gpa,
options = list(
dom = 'tp',
scrollX = T,
autoWidth = T,
columnDefs = list(
list(width = '200px', targets = 1),
list(width = '100px', targets = 3),
list(width = '50px', targets = c(0, 2, 5, 7:9))
)
),
rownames = F,
class = 'nowrap|compact'
)
```
<br>
<br>
## The Standard Regression Model
Now for the underlying model. We can show it in a couple different ways. First we start with just a standard regression to get our bearings.
$$\mathrm{gpa} = b_{\mathrm{intercept}} + b_{\mathrm{occ}}\cdot \mathrm{occasion} + \epsilon$$
We have coefficients ($b_*$) for the intercept and the effect of time. The error ($\epsilon$) is assumed to be normally distributed with mean 0 and some standard deviation $\sigma$.
$$\epsilon \sim \mathcal{N}(0, \sigma)$$
An alternate way to write the model which puts emphasis on the underlying data generating process for $\mathrm{gpa}$ can be shown as follows.
$$\mathrm{gpa} \sim \mathcal{N}(\mu, \sigma)$$
$$\mu = b_{\mathrm{intercept}} + b_{\mathrm{occ}}\cdot \mathrm{occasion}$$
More technically, the GPA and $\mu$ variables have an implicit subscript to denote each observation, but you can also think of it as a model for a single individual at a single time point.
## The Mixed Model
### Initial depiction
Now we show one way of depicting a mixed model that includes a unique effect for each student. Consider the following model for a single student[^notation]. This shows that the student-specific effect, i.e. the deviation in GPA just for that student being who they are, can be seen as an additional source of variance.
$$\mathrm{gpa} = b_{\mathrm{intercept}} + b_{\mathrm{occ}}\cdot \mathrm{occasion} + (\mathrm{effect}_{\mathrm{student}} + \epsilon)$$
We would (usually) assume the following for the student effects.
$$\mathrm{effect}_{\mathrm{student}} \sim \mathcal{N}(0, \tau)$$
So the student effects are random, and specifically they are normally distributed with mean of zero and some estimated standard deviation ($\tau$). In other words, conceptually, the only difference between this mixed model and a standard regression is the student effect, which on average is no effect, but typically varies from student to student by some amount that is on average $\tau$.
If we rearrange it, we can instead focus on model coefficients, rather than as an additional source of error.
$$\mathrm{gpa} = (b_{\mathrm{intercept}} + \mathrm{effect}_{\mathrm{student}}) + b_{\mathrm{occ}}\cdot \mathrm{occasion} + \epsilon$$
Or more succinctly:
$$\mathrm{gpa} = b_{\mathrm{int\_student}} + b_{\mathrm{occ}}\cdot \mathrm{occasion} + \epsilon$$
In this way, we'll have student-specific intercepts, as each person will have their own unique effect added to the overall intercept, resulting in a different intercept for each person.
$$b_{\mathrm{int\_student}} \sim \mathcal{N}(b_{\mathrm{intercept}}, \tau)$$
Now we see the intercepts as normally distributed with a mean of the overall intercept and some standard deviation. As such, this is often called a *random intercepts* model.
### As a multi-level model
Another way of showing the mixed model is commonly seen in the *multilevel modeling* literature. It is shown more explicitly as a two part regression model, one at the observation level and one at the student level.
$$\mathrm{gpa} = b_{\mathrm{int\_student}} + b_{\mathrm{occ}}\cdot \mathrm{occasion} + \epsilon$$
$$b_{\mathrm{int\_student}} = b_{\mathrm{intercept}} + \mathrm{effect}_{\mathrm{student}}$$
However, after 'plugging in' the second level part to the first, it is identical to the previous.
Note how we don't have a student-specific effect for occasion. In this context, occasion is said to be a *fixed effect* only, and there is no random component. This definitely does not have to be the case though, as we'll see later.
## Application
### Initial visualization
It always helps to look before we leap, so let's do so. Here we plot GPA vs. occasion (i.e. semester) to get a sense of the variability in starting points and trends.
```{r spaghetti, echo=FALSE}
set.seed(1234)
gpa_lm = lm(gpa ~ occasion, data=gpa)
# sample_students = gpa %>% filter(student %in% sample(1:200, 10))
# occasion_sample = gpa$occasion[gpa$student %in% sample_students$student]
# gpa_sample = gpa$gpa[gpa$student %in% sample_students$student]
init = gpa %>%
modelr::add_predictions(gpa_lm, var = 'all') %>%
mutate(select = factor(student %in% sample(1:200, 10)),
sz = c(.5, 1)[select]) %>%
group_by(student, select)
init %>%
plot_ly %>%
add_lines(
x = ~ occasion,
y = ~ gpa,
size = I(.5),
opacity = .35,
color = ~ select,
size = ~ sz,
colors = scico::scico(2, begin = .25),
showlegend = F
) %>%
add_lines(
x = ~ occasion,
y = ~ gpa,
opacity = .35,
color = ~ select,
size = I(2),
colors = scico::scico(2, begin = .25),
data = filter(init, select == TRUE),
showlegend = F
) %>%
add_lines(
x = ~ occasion,
y = ~ all,
color = I(palettes$stan_red$stan_red),
opacity = .70
) %>%
theme_plotly()
```
<br>
All student paths are shown as faded paths, with a sample of 10 shown in bold. The overall trend, as estimated by the regression we'll do later, is shown in red. Two things stand out. One is that students have a lot of variability when starting out. Secondly, while the general trend in GPA is upward over time as we'd expect, individual students may vary in that trajectory.
### Standard regression
So let's get started. First, we'll look at the regression and only the time indicator as a covariate, which we'll treat as numeric for simplicity. Note that I present a cleaner version of the summarized objects for the purposes of this document.
```{r gpa_lm, echo=1:3, eval=-3}
load('data/gpa.RData')
gpa_lm = lm(gpa ~ occasion, data = gpa)
summary(gpa_lm)
pander::pander(summary(gpa_lm), round = 3)
gpa_lm_by_group = gpa %>%
split(.$student) %>%
map_df( ~ data.frame(t(coef(
lm(gpa ~ occasion, data = .x)
)))) %>%
rename(Intercept = X.Intercept.)
coef_lm = coef(gpa_lm)
```
The above tells us that starting out, i.e. when occasion is zero, the average GPA, denoted by the intercept, is `r round(coef_lm[1], 2)`. In addition, as we move from semester to semester, we can expect GPA to increase by about `r round(coef_lm[2], 2)` points. This would be fine except that we are ignoring the clustering. A side effect of doing so is that our standard errors are biased, and thus claims about statistical significance based on them would be off. More importantly however, is that we simply don't get to explore the student effect, which would be of interest by itself.
### Regression by cluster
An alternative approach we could take would be to run separate regressions for every student. However, there are many drawbacks to this- it's not easily summarized when there are many groups, typically there would be very little data within each cluster to do so (as in this case), and the models are over-contextualized, meaning they ignore what students have in common. We'll compare such an approach to the mixed model later.
### Running a mixed model
Next we run a mixed model that will allow for a student specific effect. Such a model is easily conducted in R, specifically with the package <span class="pack">lme4</span>. In the following, the code will look just like what you used for regression with <span class="func">lm</span>, but with an additional component specifying the group, i.e. student, effect. The `(1|student)` means that we are allowing the intercept, represented by `1`, to vary by student. With the mixed model, we get the same results as the regression, but as we'll see we'll have more to talk about.
```{r gpa_mixed, eval=-3}
library(lme4)
gpa_mixed = lmer(gpa ~ occasion + (1 | student), data = gpa)
summary(gpa_mixed)
```
```{r gpa_mixed_pretty, echo=FALSE}
vcovs = extract_vc(gpa_mixed, ci_level = 0) %>%
select(variance) # for icc later
extract_fixed_effects(gpa_mixed) %>%
kable_df()
extract_vc(gpa_mixed, ci_level = 0) %>%
select(-var_prop) %>%
kable_df()
```
First we see that the coefficients, i.e. or in this context they can be called the *fixed* effects, for the intercept and time are the same as we saw with the standard regression[^lmlmercoef], as would be their interpretation. The standard errors, on the other hand are different here, though in the end our conclusion would be the same as far as statistical significance goes. Note specifically that the standard error for the intercept has increased. Conceptually you can think about allowing random intercepts per person allows us to gain information about the individual, while recognizing the uncertainty with regard to the overall average that we were underestimating before[^sewithin].
While we have coefficients and standard errors, you might have noticed that <span class="pack">lme4</span> does not provide p-values! There are [several reasons](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#why-doesnt-lme4-display-denominator-degrees-of-freedomp-values-what-other-options-do-i-have) for this, namely that with mixed models we are essentially dealing with different sample sizes, the $N_c$ within clusters, which may vary from cluster to cluster (and even be a single observation!), and N total observations, which puts us in kind of a fuzzy situation with regard to reference distributions, denominator degrees of freedom, and how to approximate a 'best' solution. Other programs provide p-values automatically as if there is no issue, and without telling you which approach they use to calculate them (there are several). Furthermore, those approximations may be very poor in some scenarios, or make assumptions that may not be appropriate for the situation[^fuzzyp].
However, it's more straightforward to get confidence intervals, and we can do so with <span class="pack">lme4</span> as follows[^confint].
```{r gpa_mixed_ci, eval=FALSE}
confint(gpa_mixed)
```
```{r gpa_mixed_ci_pretty, echo=FALSE}
extract_vc(gpa_mixed) %>%
kable_df(align = 'lrr', digits = 3)
```
#### Variance components
One thing that's new compared to the standard regression output is the estimated standard deviation/variance of the student effect ($\tau$/$\tau^2$ in our formula depiction from before). This tells us how much, on average, GPA bounces around as we move from student to student. In other words, even after making a prediction based on time point, each student has their own unique deviation, and that value (in terms of the standard deviation) is the estimated average deviation across students. Note that scores move due to the student more than double what they move based on a semester change. This is an important interpretive aspect not available to us with a standard regression model.
Another way to interpret the variance output is to note percentage of the student variance out of the total, or `r round(vcovs[1,1], 3)` / `r round(sum(vcovs), 3)` = `r round(vcovs[1,1]/sum(vcovs), 2)*100`%. In this setting, this value is also called the *intraclass correlation*, because it is also an estimate of the within cluster correlation, as we'll see later.
#### Estimates of the random effects
After running the model, we can actually get estimates of the student effects[^blup]. I show two ways for the first five students, both as random effect and as random intercept (i.e. intercept + random effect).
```{r randeffs, eval=FALSE}
ranef(gpa_mixed)$student %>% head(5)
# showing mixedup::extract_random_effects(gpa_mixed)
```
```{r randeffs_pretty, echo=FALSE}
extract_random_effects(gpa_mixed) %>%
head(5) %>%
kable_df(align = 'r')
```
```{r randints, eval=FALSE}
coef(gpa_mixed)$student %>% head(5)
```
```{r randints_pretty, echo=FALSE}
extract_random_coefs(gpa_mixed) %>%
head(5) %>%
kable_df(align = 'rr')
```
Note that we did not allow occasion to vary, so it is a constant, i.e. *fixed*, effect for all students.
Often, we are keenly interested in these effects, and want some sense of uncertainty regarding them. With <span class="pack">lme4</span> this typically would be done via bootstrapping, specifically with the <span class="func">bootMer</span> function within <span class="pack">lme4</span>. However, for some users this may be a bit of a more complex undertaking. The <span class="pack">merTools</span> package provides an easier way to get this with the <span class="func">predictInterval</span> function[^predinterval]. Or you can go straight to the plot of them.
```{r ranef_interval, eval=FALSE}
library(merTools)
predictInterval(gpa_mixed) # for various model predictions, possibly with new data
REsim(gpa_mixed) # mean, median and sd of the random effect estimates
plotREsim(REsim(gpa_mixed)) # plot the interval estimates
```
The following plot is of the estimated random effects for each student and their interval estimate (a modified version of the plot produced by that last line of code[^mertoolsplotlabels]). Recall that the random effects are normally distributed with a mean of zero, shown by the horizontal line. Intervals that do not include zero are in bold. In this case, such students are relatively higher or lower starting ouit compared to a typical student.
```{r ranef_interval_show, echo=FALSE}
# library(merTools) # use implicit or it will load bbmle which will load MASS
# also, it will confusingly predict N re rather than Ngroups, as it uses the original data.
# merTools::predictInterval(gpa_mixed,
# which = 'random',
# newdata = gpa %>% filter(occasion == 1)) %>%
# round(2) %>%
# mutate(student = 1:200) %>%
# select(student, fit, upr, lwr) %>%
# DT::datatable(rownames = F, options=list(dom='ltipr'))
# merTools::plotREsim(merTools::REsim(gpa_mixed)) +
# labs(x='Student', y='Value', title='Plot of Random Effects', subtitle='Interval estimates ') +
# geom_hline(aes(yintercept=0), color='orange', alpha=.5) +
# theme_clean() +
# theme(axis.text.x = element_blank(),
# axis.ticks.x = element_blank(),
# strip.text.x = element_blank(),
# strip.text.y = element_blank(),
# panel.background = element_rect(fill='transparent', color=NA), # apparently all ignored for reasons unknown
# plot.background = element_rect(fill='transparent', color=NA),
# strip.background = element_rect(fill='transparent', color=NA))
visibly::plot_coefficients(gpa_mixed, ranef = TRUE, which_ranef = 'student') +
ggtitle('Plot of Random Effects', subtitle = 'Interval Estimates') +
labs(x = 'Student') +
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
strip.text.x = element_blank(),
strip.text.y = element_blank(),
panel.background = element_rect(fill = 'transparent', color = NA),
# apparently all ignored for reasons unknown
plot.background = element_rect(fill = 'transparent', color = NA),
strip.background = element_rect(fill = 'transparent', color = NA)
)
```
#### Prediction
Let's now examine standard predictions vs. cluster-specific predictions. As with most R models, we can use the <span class="func" style = "">predict</span> function on the model object.
```{r predict_uncond}
predict(gpa_mixed, re.form=NA) %>% head()
```
In the above code we specified not to use the random effects `re.form=NA`, and as such, our predictions for the observations are pretty much what we'd get from the standard linear model.
```{r predict_uncond_lm, echo=1:2}
predict_no_re = predict(gpa_mixed, re.form=NA)
predict_lm = predict(gpa_lm)
tibble(student = as.numeric(gpa$student),
lm = predict_lm,
`lmer no re`=predict_no_re) %>%
round(2) %>%
DT::datatable(rownames=F, width=500, options=list(dom='pt'))
```
But each person has their unique intercept, so let's see how the predictions differ when we incorporate that information.
```{r predict_cond_lm, echo=1}
predict_with_re = predict(gpa_mixed)
tibble(
student = as.numeric(gpa$student),
lm = predict_lm,
`lmer no re` = predict_no_re,
`lmer with re` = predict_with_re
) %>%
round(2) %>%
DT::datatable(rownames = F,
width = 500,
options = list(dom = 'pt'))
```
Depending on the estimated student effect, students will start above or below the estimated intercept for all students. The following visualizes the unconditional prediction vs. the conditional prediction that incorporates the random intercept for the first two students.
```{r predict_cond_lm_plot, echo=FALSE}
# note that plotly will warn because it's plotly
tibble(
student = as.character(gpa$student),
occasion = gpa$occasion,
gpa = gpa$gpa,
lm = predict_lm,
`lmer no re` = predict_no_re,
`lmer with re` = predict_with_re
) %>%
filter(student %in% 1:2) %>%
group_by(student) %>%
plot_ly() %>%
add_markers(
x = ~ occasion,
y = ~ gpa,
color = ~ student,
showlegend = F
) %>%
add_lines(
x = ~ occasion,
y = ~ lm,
color = I('#ff5500'),
showlegend = T,
name = 'lm'
) %>%
add_lines(
x = ~ occasion,
y = ~ `lmer with re`,
color = ~ student,
showlegend = T,
name = 'mixed'
) %>%
theme_plotly()
```
<br>
We can see that the predictions from the mixed model are shifted because of having a different intercept. For these students, the shift reflects their relatively poorer start.
## Cluster Level Covariates
Note our depiction of a mixed model as a multilevel model.
$$\mathrm{gpa} = b_{\mathrm{int\_student}} + b_{\mathrm{occ}}\cdot \mathrm{occasion} + \epsilon$$
$$b_{\mathrm{int\_student}} = b_{\mathrm{intercept}} + \mathrm{effect}_{\mathrm{student}}$$
If we add student a student level covariate, e.g sex, to the model, we then have the following.
$$b_{\mathrm{int\_student}} = b_{\mathrm{intercept}} + b_{\mathrm{sex}}\cdot \mathrm{sex} + \mathrm{effect}_{\mathrm{student}}$$
Which, after plugging in, we still have the same model as before, just with an additional predictor.
$$\mathrm{gpa} = b_{\mathrm{intercept}} + b_{\mathrm{occ}}\cdot \mathrm{occasion}+ b_{\mathrm{sex}}\cdot \mathrm{sex} + (\mathrm{effect}_{\mathrm{student}} + \epsilon)$$
So in the end, adding cluster level covariates doesn't have any unusual effect on how we think about the model[^mlevel]. We simply add them to our set of predictor variables. Note also, that we can create cluster level covariates as group means or some other summary of the observation level variables. This is especially common when the clusters represent geographical units and observations are people. For example, we might have income as a person level covariate, and use the median to represent the overall wealth of the geographical region.
## Summary of Mixed Model Basics
Mixed models allow for us to take into account observed structure in the data. If this were all it was used for, we would have more accurate inference relative to what would be had if we ignored that structure. However, we get much more! We better understand the sources of variability in the target variable. We also get group specific estimates of the parameters in the model, allowing us to understand exactly how the groups differ from one another. Furthermore, this in turn allows for group specific prediction, and thus much more accurate prediction, assuming there is appreciable variance due to the clustering. In short, there is much to be gained by mixed models, even in the simplest of settings.
## Exercises for Starting Out
### Sleep
For this exercise, we'll use the sleep study data from the <span class="pack">lme4</span> package. The following describes it.
> The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time (in milliseconds) on a series of tests given each day to each subject.
After loading the package, the data can be loaded as follows. I show the first few observations.
```{r sleepstudy, echo=-3}
library(lme4)
data("sleepstudy")
head(sleepstudy) %>% kable_df()
```
1. Run a regression with Reaction as the target variable and Days as the predictor.
2. Run a mixed model with a random intercept for Subject.
3. Interpret the variance components and fixed effects.
### Adding the cluster-level covariate
Rerun the mixed model with the [GPA data][Mixed model] adding the cluster level covariate of `sex`, or high school GPA (`highgpa`), or both. Interpret all aspects of the results.
```{r gpa_cluster, echo=F, eval=FALSE}
gpa_mixed_cluster_level = lmer(gpa ~ occasion + sex + highgpa + (1|student), gpa)
summary(gpa_mixed_cluster_level)
```
What happened to the student variance after adding cluster level covariates to the model?
### Simulating a mixed model
The following represents a simple way to simulate a random intercepts model. Note each object what each object is, and make sure the code make sense to you. Then run it.
```{r simMixed, eval=FALSE}
set.seed(1234) # this will allow you to exactly duplicate your result
Ngroups = 100
NperGroup = 3
N = Ngroups * NperGroup
groups = factor(rep(1:Ngroups, each = NperGroup))
u = rnorm(Ngroups, sd = .5)
e = rnorm(N, sd = .25)
x = rnorm(N)
y = 2 + .5 * x + u[groups] + e
d = data.frame(x, y, groups)
```
Which of the above represent the fixed and random effects? Now run the following.
```{r simMixed2, eval=FALSE}
model = lmer(y ~ x + (1|groups), data=d)
summary(model)
confint(model)
library(ggplot2)
ggplot(aes(x, y), data=d) +
geom_point()
```
Do the results seem in keeping with what you expect?
In what follows we'll change various aspects of the data, then rerun the model after each change, then summarize and get confidence intervals as before. For each note specifically at least one thing that changed in the results.
0. First calculate or simply eyeball the intraclass correlation coefficient $\frac{\textrm{random effect variance}}{\textrm{residual + random effect variance}}$. In addition, create a density plot of the random effects as follows.
```{r simMixed3, eval=FALSE}
re = ranef(model)$groups
qplot(x = re, geom = 'density', xlim = c(-3, 3))
```
1. Change the random effect variance/sd and/or the residual variance/sd and note your new estimate of the ICC, and plot the random effect as before.
2. Reset the values to the original. Change <span class="objclass">Ngroups</span> to 50. What differences do you see in the confidence interval estimates?
3. Set the Ngroups back to 100. Now change <span class="objclass">NperGroup</span> to 10, and note again the how the CI is different from the base condition.
[^richlypar]: I actually like [Richly Parameterized Linear Models](https://www.crcpress.com/Richly-Parameterized-Linear-Models-Additive-Time-Series-and-Spatial/Hodges/p/book/9781439866832), or [Structured Additive Regression Models](https://www.springer.com/us/book/9783642343322). Both are a mouthful, but at least the latter reduces to [STARs](http://m-clark.github.io/workshops/stars/).
[^vcmodel]: Actually, the simplest model would have no covariates at all, just *variance components*, with no correlations among the random effects. Such a model can be interesting to look at while exploring your data, but would rarely suffice on its own to tell the story you desire to.
[^notation]: Note that I leave out the observation level subscript to keep things clean. I find that multilevel style notation quickly becomes unwieldy, and don't wish to reproduce the problem. It also tends to add confusion to a lot of applied researchers starting out with mixed models.
[^lmlmercoef]: This will not always be the case, e.g. with unbalanced data, but they should be fairly close.
[^sewithin]: The standard error for our time covariate went down due to our estimate of $\sigma$ being lower for this model, and there being no additional variance due to cluster membership.
[^fuzzyp]: Note that many common modeling situations involve a fuzzy p setting, but especially penalized regression approaches such as mixed, additive, ridge regression models etc. Rather than be a bad thing, this usually is a sign you're doing something interesting, or handling complexity in an appropriate way.
[^confint]: See `?confint.merMod` for details and options. The output you see is based on my wrapper `mixedup::extract_vc`.
[^blup]: These are sometimes referred to as BLUPs or EBLUPs, which stands for (empirical) best linear unbiased prediction. However, they are only BLUP for *linear* mixed effects models. As such you will also see them referred to as *conditional mode*. Furthermore, in the Bayesian context, the effects are actually estimated as additional model parameters, rather than estimated/predicted after the fact.
[^predinterval]: Note that while <span class="func">predictionInterval</span> does not quite incorporate all sources of uncertainty as does <span class="func">bootMer</span>, it's actually feasible for larger data sets, and on par with the Bayesian results (e.g. with <span class="pack">rstanarm</span>).
[^mlevel]: This is why the multilevel depiction is sub-par, and leads many to confusion at times. You have a target variable and selected feature covariates to predict that target based on theory. Whether they are cluster level variables or if there are interactions doesn't have anything to do with the data structure as much as it does the theoretical motivations. However, if you choose to depict the model in multilevel fashion, the final model must adhere to the 'plugged in' result. So if, for example, you posit a cluster level variable for a random slope, you *must* include the implied interaction of the cluster level and observation level covariates.
[^mertoolsplotlabels]: Note that the default plot from <span class="pack">merTools</span> is confusingly labeled for single random effect, because it unnecessarily adds a facet. You'll understand it better by looking the plot in the discussion of [crossed random effects][Cross-classified models] later. However, the one displayed is from my own package, [visibly](https://m-clark.github.io/visibly).