-
Notifications
You must be signed in to change notification settings - Fork 15
/
marginal-effects_categorical-predictor.Rmd
executable file
·286 lines (214 loc) · 9.67 KB
/
marginal-effects_categorical-predictor.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
---
title: "Marginal effects from categorical predictors"
output:
html_document:
toc: true
---
This walks through generating average marginal effects on the response scale for models with categorical predictors using `tidybayes::add_fitted_draws`. The example here uses `rstanarm`, but should work with minimal modification on `brms` as well.
Note: I am relatively new to average marginal effects, so this is partially my attempt to explain them and partially my attempt to better understand them myself. If there is anything here that looks off to you, please file an issue or a pull request.
## Setup
```{r setup, message = FALSE, warning = FALSE}
library(tidyverse)
library(magrittr)
library(ggplot2)
library(rstan)
library(rstanarm)
library(modelr)
library(tidybayes)
library(ggstance)
library(patchwork)
library(latex2exp)
theme_set(theme_light() + theme(
panel.border = element_blank()
))
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
```
## Data
```{r}
set.seed(12345)
a1_prob = .3
a2_prob = .7
k = 40
df = bind_rows(
data_frame(A = "a1", B = "b1", y = rbinom(k, 1, a1_prob) == 1),
data_frame(A = "a2", B = "b1", y = rbinom(k, 1, a2_prob) == 1),
data_frame(A = "a1", B = "b2", y = rbinom(k, 1, a1_prob + .1) == 1),
data_frame(A = "a2", B = "b2", y = rbinom(k, 1, a2_prob + .1) == 1)
)
```
### Data plot
```{r}
df %>%
ggplot(aes(y = B, fill = y)) +
geom_barh() +
facet_grid(A ~ .) +
scale_fill_brewer()
```
## Model
```{r, cache = TRUE}
m = stan_glm(y ~ A*B, data = df, family = binomial)
```
```{r}
summary(m)
```
## Marginal effects at particular/"representative" values ("MER" marginal effects)
Let's say we want to know the marginal mean conditional on particular values of our predictors. This is essentially what `add_fitted_draws`/`posterior_linpred` give us.
For example, the model above is something like this:
$$
\begin{align}
y | A, B &\sim \textrm{Bernoulli}(q | A, B)\\
q | A, B &= g_p(A, B)
\end{align}
$$
Each observation is drawn from a Bernoulli distribution, and the probability of an observation being `TRUE` is equal to $q$, which is a function of $A$ and $B$. For simplicity, we've just let that be some function $g_p(A, B)$. If we want to get particular about it, that function for the above model is something like this:
$$
\begin{align}
q | A, B &= g_p(A, B)\\
&= \textrm{logit}^{-1}(\alpha + \beta_A[A = a_2] + \beta_B[B = b_2] + \beta_{AB}[A = a_2][B=b_2])
\end{align}
$$
But the actual function $g_p$ **doesn't really matter for our purposes**, so long as there's something that can calculate it for us. What we want to know the expected value of $y$, which is the same thing as the probability $q$:
$$
\begin{align}
\textrm{E}[y | A, B] &= q | A, B
\end{align}
$$
Put another way, what proportion of observations in the population do we expect to be equal to `TRUE` for each combination of values of $A$ and $B$?
Fortunately, `add_fitted_draws` gives us the posterior distribution for exactly this quantity:
```{r}
AB_plot = df %>%
data_grid(A, B) %>%
# `value` just changes the column name in the output, not what any of the results are.
# I've set to "E[y|A,B]" to make the correspondence to the math clear as this gets more
# complicated as we go along
add_fitted_draws(m, value = "E[y|A,B]") %>%
ggplot(aes(y = paste0("E[y|A = ", A, ", B = ", B, "]"), x = `E[y|A,B]`, fill = A)) +
geom_halfeyeh() +
facet_grid(A ~ ., scales = "free") +
xlim(0, 1) +
ylab("B") +
geom_vline(xintercept = c(0, 1))
AB_plot
```
## Average marginal effects ("AME")
But let's say we wanted to get the average effect for each value of $A$, marginalizing over $B$. So we want:
$$
\begin{align}
\textrm{E}[y | A]
\end{align}
$$
Our model can't give us that directly. But we can use the [law of total expectation](https://en.wikipedia.org/wiki/Law_of_total_expectation) to express this expectation in terms of things we *can* get our model to give us:
$$
\begin{align}
&E[y | A] &=& &\sum_{b \in \mathcal{B}}&\textrm{E}[y | A, B = b] \Pr[B = b]\\
&&=&&& \textrm{E}[y | A, B = b_1] \Pr[B = b_1] +\\&&&&& \textrm{E}[y | A, B = b_2] \Pr[B = b_2]
\end{align}
$$
Two of these quantities are given to us by `fitted_draws`: $\textrm{E}[y | A, B = b_1]$ and $\textrm{E}[y | A, B = b_2]$.
However, we still need $\Pr[B = b_1]$ and $\Pr[B = b_2]$ to generate a marginal effect. These probabilities must come from somewhere else: maybe our experimental design, or maybe the population we're interested in. For now, we'll assume $\Pr[B = b_1] = \Pr[B = b_2] = 0.5$, i.e. $b_1$ and $b_2$ are equally likely in the population. Then we have:
$$
\begin{align}
&E[y | A] &=&&& \textrm{E}[y | A, B = b_1] \Pr[B = b_1] +\\&&&&& \textrm{E}[y | A, B = b_2] \Pr[B = b_2]\\
&&=&&& \textrm{E}[y | A, B = b_1] \cdot 0.5 +\\&&&&& \textrm{E}[y | A, B = b_2] \cdot 0.5
\end{align}
$$
Or equivalently, when we assume that all levels of some categorical variable we want to marginalize out are equally likely, then we can use an unweighted mean to marginalize it out:
$$
\begin{align}
&E[y | A] &=&& {1 \over |\mathcal{B}|} \sum_{b \in \mathcal{B}}\textrm{E}[y | A, B = b]
\end{align}
$$
**It is very important to stress** that this depends on a population made up of equal proportions of all possible values of $B$ being a meaningful thing to talk about.
Given the above, we can take the following steps to get the marginalized version:
1. Condition on all values of categorical variables (using `modelr::data_grid`)
2. Get draws from the distribution for the expected value of the response conditional on those variables (using `tidybayes::add_fitted_draws`)
3. Group by every predictor we don't want to marginalize out + the `.draw` column (so we average within draws)
3. Marginalize out the categorical variables we don't care about by averaging over them (`dplyr::summarise` + `mean`, or `dplyr::summarise` + `weighted.mean` if you have non-equal proportions for your population)
```{r fig.height = 6, fig.width = 6}
A_plot = df %>%
data_grid(A, B) %>% # condition on everything
add_fitted_draws(m, value = "E[y|A,B]") %>% # get conditional expectations
group_by(A, .draw) %>% # group by predictors to keep
summarise(`E[y|A]` = mean(`E[y|A,B]`)) %>% # marginalize out other predictors
ggplot(aes(y = paste0("E[y|A = ", A, "]"), x = `E[y|A]`, fill = A)) +
geom_halfeyeh() +
xlim(0, 1) +
facet_grid(A ~ ., scale = "free") +
ylab(NULL) +
geom_vline(xintercept = c(0, 1))
A_plot / AB_plot +
plot_layout(heights = c(1, 2))
```
You should be able to see how the marginalized effects in the top plot are the averages of the corresponding distributions in the lower plot.
If all went well, these marginalized effects should line up well with the observed proportions in the data (since they had equal numbers of observations per condition and we haven't used any strong priors):
```{r}
marginal_data_plot = df %>%
ggplot(aes(y = fct_rev(A), fill = y)) +
geom_barh() +
scale_fill_brewer() +
ylab("A")
A_plot / marginal_data_plot
```
### Versus a model without B
Because there are an equal number of observations in every group in this dataset, these estimate line up with what we would get if we just omitted B from the model altogether:
```{r}
m2 = stan_glm(y ~ A, data = df, family = binomial)
```
```{r}
A_plot_2 = df %>%
data_grid(A) %>% # condition on everything
add_fitted_draws(m2, value = "E[y|A]") %>% # get conditional expectations
ggplot(aes(y = paste0("E[y|A = ", A, "]"), x = `E[y|A]`, fill = A)) +
geom_halfeyeh() +
xlim(0, 1) +
facet_grid(A ~ ., scale = "free") +
ylab(NULL) +
xlab("E[y|A], from model without B") +
geom_vline(xintercept = c(0, 1))
A_plot / A_plot_2
```
This would not be the case if we had unequal cell sizes.
### Differences in AMEs
Given $\textrm{E}[y | A = a_1]$ and $\textrm{E}[y | A = a_2]$ as before:
```{r fig.height = 2, fig.width = 6}
A_plot
```
We might want to derive the posterior distribution for the difference in these means:
$$
\textrm{E}[y | A = a_2] - \textrm{E}[y | A = a_1]
$$
Since we can generate draws from the distributions for both quantities, we can use `tidybayes::compare_levels` to compute this difference (this will simply subtract the draws from pairwise combinations of sets of draws, according to the levels of some factor; in this case, `A`):
```{r, fig.height = 1.25, fig.width = 5}
# this part is the same as before
marginal_effects = df %>%
data_grid(A, B) %>% # condition on everything
add_fitted_draws(m, value = "E[y|A,B]") %>% # get conditional expectations
group_by(A, .draw) %>% # group by predictors to keep
summarise(`E[y|A]` = mean(`E[y|A,B]`)) # marginalize out other predictors
# we can use compare_levels to calculate the mean difference
mean_diffs = marginal_effects %>%
compare_levels(`E[y|A]`, by = A) %>% # pairwise differences in `E[y|A]`, by levels of A
rename(`mean difference` = `E[y|A]`) # give this column a more accurate name
mean_diffs %>%
ggplot(aes(x = `mean difference`, y = A)) +
geom_halfeyeh() +
geom_vline(xintercept = 0, linetype = "dashed") +
xlim(-.4, .6)
```
And here's a fancy version of this plot, aligned with the marginal estimates:
```{r, fig.height = 3, fig.width = 6}
# this is just so we can align the plot axes below
median_a1 = marginal_effects %>%
filter(A == "a1") %$%
median(`E[y|A]`)
A_plot_diff = mean_diffs %>%
ggplot(aes(y = A, x = `mean difference`)) +
geom_halfeyeh() +
ylab(NULL) +
geom_vline(xintercept = c(0, 1), linetype = "dashed") +
coord_cartesian(xlim = c(-median_a1, 1 - median_a1))
(A_plot + geom_vline(xintercept = median_a1, linetype = "dashed")) /
A_plot_diff +
plot_layout(heights = c(2, 1))
```