-
Notifications
You must be signed in to change notification settings - Fork 1
/
LMM_Propensity.Rmd
217 lines (126 loc) · 7.28 KB
/
LMM_Propensity.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
---
title: "LMM"
author: "Christina De Cesaris"
date: "3/3/2021"
output: html_document
---
```{r include=FALSE}
library(lmerTest)
library(tidyverse)
require(survival)
library(nnet)
library(MatchIt)
library(kableExtra)
library(lubridate)
library(lmerTest)
library(scales)
final=read.csv("Data/final.csv")
which(is.na(final))
final=drop_na(final)
which(is.na(final)) #good
#sapply(final,class)
final$MetCode=as.factor(final$MetCode)
final$county=as.factor(final$county)
final$date=as.factor(final$date)
```
```{r include=FALSE}
#grab our potential predictors and response, MetCode is our treatment group
dat = final[,c("county" ,'new_casesrate','elder_ratio','HouseDensity.per.square.mile.of.land.area','date','Poverty_Estimate',"Poverty_Percent","Pop_Estimate","MetCode","totalcountconfirmed"),]
#look for any particular parterns. The clearist is household income and povery rates--unsurpisingly
#pairs(Filter(is.numeric, dat))
dat
```
Our proposed model seeks to determine a casual relationship between an area's metropolitan status and Covid-19 infection rates. As previously stated, we assume the independent fixed effects of elder ratio and poverty percent contribute underlying effects in determining both whether a county is labeled as metropolitan and its associated infection rates.
It is under our assumption that metropolitan locations tend to have lower numbers of elderly and higher levels of poverty compared to nonmetropolitan areas.
The lme4 package was used to fit our Linear Mixed-Effects Models. Since the data contained a multitude of observations for different counties over the course of several days, we assumed existing random, unmeasurable effects were present in our data. In this model, we define county and date effects as random effects we cannot control.
Our Linear Mixed-Effects Model takes the following form:
$$y = {X\beta} + {Zu} + {\varepsilon}$$
Where
- $y$ represents an $N \times 1$ vector of our fatality rates.
- $X$ is a $N \times p$ matrix of our predictor variables.
- $\beta$ is a $p \times 1$ column of our fixed effects coefficients.
- $Z$ is a $N \times q:groups$ matrix for the $q$ random effects for our county and state variables.
- Finally, $\epsilon$ captures the portion of $y$ not included by the rest of the model.
(McCullagh, P, & Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman & Hall/CRC Press.)
The distribution of infection rates was highly right skewed, so a log(x+1) transformation was applied to the numeric variables in the data.
```{r echo=FALSE, message=FALSE, warning=FALSE}
par(mfrow=c(1,2))
#Transformation
#check the distribution of our response and predictors
hist(final$new_casesrate,col = rainbow(5), xlab="InfectionRate", main="Distribution before Transformation") #this is very right skewed
#normalize
#hist(log(final$new_casesrate),col = rainbow(5)) #lots better
#adjust for the 0 value rates
hist(log(final$new_casesrate+1),col = rainbow(5),xlab="log(InfectionRate+1)", main="After Transformation")
dat=dat[-c(which(final$new_casesrate<0)),]
log_dat=log((Filter(is.numeric,dat))+1)
transformed= cbind(log_dat, Filter(is.factor,dat))
#transformed
transformed=(drop_na(transformed))
```
A quick comparision of the average infection rates between our treatment and control indicates a higher average present in metropolitan coded counties.
```{r}
boxplot(transformed$new_casesrate~transformed$MetCode,main='Comparision of Averages Between Groups',
xlab='MetroCode',ylab='Infection Rate',col=rainbow(5))
```
A Mixed Effects Linear model was fit without the use of propensity score weighting.
In the case of this model, our treatment was MetroCode=1 (metropolitan county) and our control was MetroCode=0 (non-metropolitan county)
```{r echo=TRUE}
model1 = lmer(new_casesrate~(elder_ratio)+factor(MetCode)+I(Poverty_Percent/100)+(1|county)+(1|date),data = transformed)
summary(model1)
```
We found the treatment effect to be statistically significant at p=0.1. Poverty rates were not found to have a significant effect while elder ratios were statistically significant with p<0.001.
However, our previous analysis leads us to believe both poverty rates and elder ratios influence whether an area is determined as metropolitan or not.
To assess the extend of this bias during assignment, we will compare the imbalance present.
MetroCodes against Elder Ratio
```{r echo=FALSE}
transformed$MetCode=as.numeric(as.character(transformed$MetCode))
## Balance analysis
model2 = lm(elder_ratio~MetCode,data = transformed)
summary(model2)#clearly there is bias between
```
MetroCodes against Poverty Ratio
```{r}
## Balance analysis
model3 = lm((Poverty_Percent/100)~MetCode,data = transformed)
summary(model3)#clearly there is bias between
```
In both cases there appears to be strong imbalance between MetroCode levels and our two covariates as p<0.01. This suggests there may be selection bias present. We aim to use propensity score to reduce these effects.
Propensity scores were determined using logistic regression with our treatment as the response against our two covariates. The probabilities estimated by the logistic regression model were processed and inverted. The final scores were used as weights in the final Linear Mixed Model.
```{r echo=FALSE}
#transformed$MetCode = #relevel(transformed$MetCode,ref='0')
transformed$MetCode = relevel(dat$MetCode,ref="0")
models = glm(MetCode~(elder_ratio)+I(Poverty_Percent/100),family=binomial(logit), data = transformed)
prob = models$fitted.values
pscore = ifelse(transformed$MetCode=="1",prob,(1-prob))
transformed$pscore=pscore
quantile(pscore)
#pscore
weight = 1/pscore
summary(models)
col.alpha<-function(color, alpha){
code=col2rgb(color)/256
rgb(code[1],code[2],code[3],alpha)
}
#better hist
hist(unique(transformed$pscore[transformed$MetCode==1]), breaks=25, col=col.alpha("red",.6), freq=F, ylim=c(0,5), xlab="Propensity Score", ylab="", main="Propensity Score Distribution")
hist(unique(transformed$pscore[transformed$MetCode==0]), breaks=25, col=col.alpha("lightblue",.6), freq=F, ylim=c(0,5), xlab="Propensity Score", ylab="", main="",add=T)
legend(.4, 5, legend=c("Metro", "NonMetro"),
col=c("red", "blue"),fill = c("red", "lightblue") )
#likely won't be much change, the confunding effects are too large
```
The distribution of the propensity scores indicates there is not a significant difference in scoring between the treatment and control group despite the existence of confounding effects. The propensity scoring and matching method used may be improved upon with alternate methods.
# Weighted Model
```{r}
model_new7 = lmer(new_casesrate~(elder_ratio)+factor(MetCode)+I(Poverty_Percent/100)+(1|county)+(1|date),data = transformed,weights = weight)
summary(model_new7) #1|pred -> predbecomes random effect here
```
The final mixed effects model with propensity score weights is not significantly different than the model without. The difference in estimates is not marginally different from the unweighted model. Our treatment group has a positive relationship with the infection rate (p<0.01), indicating metropolitan counties may have high infection rates than nonmetropolitan counties.
# Sensitivity Analysis
```{r}
plot(model_new7,type=c('p','smooth'))
qqnorm(unique(resid(model_new7)))
qqline(resid(model_new7))
anova(model_new7)
```
The final Linear Mixed-Effects model