-
Notifications
You must be signed in to change notification settings - Fork 17
/
bayesian-linear-regression.Rmd
171 lines (122 loc) · 3.93 KB
/
bayesian-linear-regression.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
# Bayesian Linear Regression
The following provides a simple working example of a standard regression model
using Stan via <span class="pack" style = "">rstan</span>. It will hopefully to
allow some to more easily jump in to using Stan if they are comfortable with R.
You would normally just use <span class="pack" style = "">rstanarm</span> or
<span class="pack" style = "">brms</span> for such a model however.
## Data Setup
Create a correlation matrix of one's choosing assuming response as last column/row. This approach allows for some collinearity in the predictors.
```{r bayes-linreg-setup}
library(tidyverse)
cormat = matrix(
c(
1, .2, -.1, .3,
.2, 1, .1, .2,
-.1, .1, 1, .1,
.3, .2, .1, 1
),
ncol = 4,
byrow = TRUE
)
cormat
cormat = Matrix::nearPD(cormat, corr = TRUE)$mat
n = 1000
means = rep(0, ncol(cormat))
d = MASS::mvrnorm(n, means, cormat, empirical = TRUE)
colnames(d) = c('X1', 'X2', 'X3', 'y')
d[,'y'] = d[,'y'] - .1 # unnecessary, just to model a non-zero intercept
str(d)
cor(d)
# Prepare for Stan
# create X (add intercept column) and y for vectorized version later
X = cbind(1, d[,1:3]); colnames(X) = c('Intercept', 'X1', 'X2', 'X3')
y = d[,4]
```
## Model Code
Initial preparation, create the data list object.
```{r bayes-linreg-standat}
dat = list(
N = n,
k = 4,
y = y,
X = X
)
```
Create the Stan model code.
```{stan stan-linreg-code, output.var='bayes_linreg'}
data { // Data block; declarations only
int<lower = 0> N; // Sample size
int<lower = 0> k; // Dimension of model matrix
matrix [N, k] X; // Model Matrix
vector[N] y; // Target
}
/* transformed data { // Transformed data block; declarations and statements. None needed here.
}
*/
parameters { // Parameters block; declarations only
vector[k] beta; // Coefficient vector
real<lower = 0> sigma; // Error scale
}
transformed parameters { // Transformed parameters block; declarations and statements.
}
model { // Model block; declarations and statements.
vector[N] mu;
mu = X * beta; // Linear predictor
// priors
beta ~ normal(0, 1);
sigma ~ cauchy(0, 1); // With sigma bounded at 0, this is half-cauchy
// likelihood
y ~ normal(mu, sigma);
}
generated quantities { // Generated quantities block; declarations and statements.
real rss;
real totalss;
real R2; // Calculate Rsq as a demonstration
vector[N] y_hat;
y_hat = X * beta;
rss = dot_self(y - y_hat);
totalss = dot_self(y - mean(y));
R2 = 1 - rss/totalss;
}
```
## Estimation
Run the model and examine results. The following assumes a character string or file (`bayes_linreg`) of the previous model code.
```{r bayes-linreg-est, results='hide'}
library(rstan)
fit = sampling(
bayes_linreg,
data = dat,
thin = 4,
verbose = FALSE
)
```
Note the `pars` argument in the following. You must specify desired parameters or it will print out everything, including the `y_hat`, i.e. expected values. Also note that by taking into account the additional uncertainty estimating sigma, you get a shrunken Rsq (see Gelman & Pardoe 2006 sec. 3).
```{r bayes-linreg-est-show}
print(
fit,
digits_summary = 3,
pars = c('beta', 'sigma', 'R2'),
probs = c(.025, .5, .975)
)
```
## Comparison
Compare to basic <span class="func" style = "">lm</span> result.
```{r bayes-linreg-compare}
modlm = lm(y ~ ., data.frame(d))
# Compare
summary(modlm)
```
## Visualize
Visualize the posterior predictive distribution.
```{r bayes-linreg-pp}
# shinystan::launch_shinystan(fit) # diagnostic plots
library(bayesplot)
pp_check(
dat$y,
rstan::extract(fit, par = 'y_hat')$y_hat[1:10, ],
fun = 'dens_overlay'
)
```
## Source
Original code available at:
https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/Bayesian/rstan_linregwithprior.R