-
-
Notifications
You must be signed in to change notification settings - Fork 39
/
Copy path_did_simple_demo.Rmd
110 lines (79 loc) · 3.72 KB
/
_did_simple_demo.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
```{r}
# Load required libraries
library(dplyr)
library(ggplot2)
set.seed(1)
# Simulated dataset for illustration
data <- data.frame(
time = rep(c(0, 1), each = 50), # Pre (0) and Post (1)
treated = rep(c(0, 1), times = 50), # Control (0) and Treated (1)
error = rnorm(100)
)
# Generate outcome variable
data$outcome <- 5 + 3 * data$treated + 2 * data$time + 4 * data$treated * data$time + data$error
# Compute averages for 2x2 table
table_means <- data %>%
group_by(treated, time) %>%
summarize(mean_outcome = mean(outcome), .groups = "drop") %>%
mutate(
group = paste0(ifelse(treated == 1, "Treated", "Control"), ", ",
ifelse(time == 1, "Post", "Pre"))
)
# Display the 2x2 table
table_2x2 <- table_means %>%
select(group, mean_outcome) %>%
tidyr::spread(key = group, value = mean_outcome)
print("2x2 Table of Mean Outcomes:")
print(table_2x2)
# Calculate Diff-in-Diff manually
Y11 <- table_means$mean_outcome[table_means$group == "Treated, Post"] # Treated, Post
Y10 <- table_means$mean_outcome[table_means$group == "Treated, Pre"] # Treated, Pre
Y01 <- table_means$mean_outcome[table_means$group == "Control, Post"] # Control, Post
Y00 <- table_means$mean_outcome[table_means$group == "Control, Pre"] # Control, Pre
diff_in_diff_formula <- (Y11 - Y10) - (Y01 - Y00)
# Estimate DID using OLS
model <- lm(outcome ~ treated * time, data = data)
ols_estimate <- coef(model)["treated:time"]
# Print results
results <- data.frame(
Method = c("Diff-in-Diff Formula", "OLS Estimate"),
Estimate = c(diff_in_diff_formula, ols_estimate)
)
print("Comparison of DID Estimates:")
print(results)
# Visualization
ggplot(data, aes(x = as.factor(time), y = outcome, color = as.factor(treated), group = treated)) +
stat_summary(fun = mean, geom = "point", size = 3) +
stat_summary(fun = mean, geom = "line", linetype = "dashed") +
labs(
title = "Difference-in-Differences Visualization",
x = "Time (0 = Pre, 1 = Post)",
y = "Outcome",
color = "Group"
) +
scale_color_manual(labels = c("Control", "Treated"), values = c("blue", "red")) +
theme_minimal()
```
| | Control (0) | Treated (1) |
|--------------|--------------------|---------------------|
| **Pre (0)** | $\bar{Y}_{00} = 5$ | $\bar{Y}_{10} = 8$ |
| **Post (1)** | $\bar{Y}_{01} = 7$ | $\bar{Y}_{11} = 14$ |
The table organizes the mean outcomes into four cells:
1. Control Group, Pre-period ($\bar{Y}_{00}$): Mean outcome for the control group before the intervention.
2. Control Group, Post-period ($\bar{Y}_{01}$): Mean outcome for the control group after the intervention.
3. Treated Group, Pre-period ($\bar{Y}_{10}$): Mean outcome for the treated group before the intervention.
4. Treated Group, Post-period ($\bar{Y}_{11}$): Mean outcome for the treated group after the intervention.
The treatment effect is calculated as:
$\text{DID} = (\bar{Y}_{11} - \bar{Y}_{10}) - (\bar{Y}_{01} - \bar{Y}_{00})$
Using the simulated table:
$\text{DID} = (14 - 8) - (7 - 5) = 6 - 2 = 4$
This matches the **interaction term coefficient** ($\beta_3 = 4$) from the OLS regression.
**Understanding Diff-in-Diff: The Formula and OLS**
Did you know the Difference-in-Differences (DID) treatment effect calculated from the simple formula of averages is identical to the estimate from an OLS regression with an interaction term?
Here's a detailed breakdown with R code and visuals to prove it.
Compute manually:
$(\bar{Y}_{11} - \bar{Y}_{10}) - (\bar{Y}_{01} - \bar{Y}_{00})$
Use OLS regression:
$Y_{it} = \beta_0 + \beta_1 \text{treated}_i + \beta_2 \text{time}_t + \beta_3 (\text{treated}_i \cdot \text{time}_t) + \epsilon_{it}$
The interaction term coefficient ($\beta_3$) is the treatment effect.
Both methods give the same result!