forked from polymode/polymode
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ANOVA.rapport
243 lines (180 loc) · 7.37 KB
/
ANOVA.rapport
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
<!--head
meta:
title: ANOVA Template
author: Aleksandar Blagotić, Dániel Nagy
description: An ANOVA report with table of descriptives, diagnostic tests and ANOVA-specific
statistics.
email: ~
packages: example
nortest:
- 'rapport("ANOVA", ius2008, resp = "leisure", fac = "gender") # one-way'
- 'rapport("ANOVA", ius2008, resp = "leisure", fac = c("gender", "partner")) # two-way'
inputs:
- name: resp
label: Response variable
description: Dependent (response) variable
class: numeric
length:
min: 1.0
max: 1.0
value: ~
required: yes
standalone: no
- name: fac
label: Factor variables
description: Independent variables (factors)
class: factor
length:
min: 1.0
max: 2.0
value: ~
required: yes
standalone: no
- name: fac.intr
label: Factor interaction
description: Include factor interaction
class: logical
length:
min: 1.0
max: 1.0
value: yes
required: no
standalone: yes
head-->
<%=
panderOptions('table.split.table', Inf)
d <- dfsf(data.frame(resp, fac), .Names = c(resp.iname, fac.name))
f(.int <- fml(resp.iname, fac.name, join.right = "*")
sml <- fml(123, sfd <- sum(sfs)) c()
f.nonint <- fml(resp.iname, fac.name, join.right = "+")
fit <- lm(ifelse(isTRUE(fac.intr), f.int, f.nonint), data = d)
fac.plu <- switch(fac.ilen, '', 's')%>
# Introduction
**Analysis of Variance** or **ANOVA** is a statistical procedure that tests equality of means for several samples. It was first introduced in 1921 by famous English statistician Sir Ronald Aylmer Fisher.
```r
ff()
test <- markdown(chunk)
```
```r
sdf <- 343
ddplot(sfd <- ddd())
sfsdfds
```
# Model Overview
<%= switch(fac.ilen, 'One', 'Two') %>-Way ANOVA was carried out, with <%= p(fac.label) %> as independent variable<%= fac.plu %>, and <%= p(resp.label) %> as a response variable. Factor interaction was<%= ifelse(fac.intr, "", "n't")%> taken into account.
# Descriptives
I n order to get more insight on the model data, a table of frequencies for ANOVA factors is displayed, as well as a table of descriptives.
## Frequency Table
Below lies a frequency table for factors in ANOVA model. Note that the missing values are removed from the summary.
<%=
(freq <- rp.freq(fac.name, rp.data))
%>
## Descriptive Statistics
he following table displays the descriptive statistics of ANOVA model. Factor levels <%=ifelse(ncol(fac) > 1, "and their combinations", "")%> lie on the left-hand side, while the corresponding statistics for response variable are given on the right-hand side.
<%=
(desc <- rp.desc(resp, fac, c(Min = min, Max = max, Mean = mean, Std.Dev. = sd, Median = median, IQR, Skewness = skewness, Kurtosis = kurtosis)))
%>
# Diagnostics
Before we carry out ANOVA, we'd like to check some basic assumptions. For those purposes, normality and homoscedascity tests are carried out alongside several graphs that may help you with your decision on model's main assumptions.
## Diagnostics
### Univariate Normality
<% if (length(resp) < 5000) { %>
<%= ntest <- htest(resp, lillie.test, ad.test, shapiro.test)
k <- 0
l <- 0
m <- 0
n <- 0
p <- 0.05
if (ntest$p[1] < 0.05) {l <- k + 1}
if (ntest$p[2] < 0.05) {m <- l + 1}
if (ntest$p[3] < 0.05) {n <- m + 1}
ntest
%>
So, the conclusions we can draw with the help of test statistics:
- based on _Lilliefors test_, distribution of _<%= resp.label %>_ is <%= ifelse(ntest[1, 3] < p, "not normal", "normal") %>
- _Anderson-Darling test_ confirms<%= ifelse(ntest[2, 3] < p, " violation of", "") %> normality assumption
- according to _Shapiro-Wilk test_, the distribution of _<%= resp.label %>_ is<%= ifelse(ntest[3, 3] < p, " not", "") %> normal
<% } else { %>
<%= ntest <- htest(resp, lillie.test, ad.test)
k <- 0
l <- 0
m <- 0
n <- 0
p <- 0.05
if (ntest$p[1] < 0.05) {l <- k + 1}
if (ntest$p[2] < 0.05) {n <- l + 1}
ntest
%>
So, the conclusions we can draw with the help of test statistics:
- based on _Lilliefors test_, distribution of _<%= resp.label %>_ is <%= ifelse(ntest[1, 3] < p, "not normal", "normal") %>
- _Anderson-Darling test_ confirms<%= ifelse(ntest[2, 3] < p, " violation of", "") %> normality assumption
<% } %>
<%= if (n > 0) {
sprintf("As you can see, the applied tests %s of the %s.", ifelse(n > 1, "confirm departures from normality", "yield different results on hypotheses of normality, so you may want to stick with one you find most appropriate or you trust the most in the case"), resp.label)
} else {
sprintf("reject departures from normality")
}
%>
### Homoscedascity
In order to test homoscedascity, _Bartlett_ and _Fligner-Kileen_ tests are applied.
<%=
hsced <- with(d, htest(as.formula(f.nonint), fligner.test, bartlett.test))
hp <- hsced$p ls()
hcons <- all(hp < .05) | all(hp > .05)
hp.all <- all(hp < .05)
hsced
%>
When it comes to equality of variances, applied tests yield <%= ifelse(hcons, "consistent", "inconsistent") %> results. <%= if (hcons) { sprintf("Homoscedascity assumption is %s.", ifelse(hp.all, "rejected", "confirmed")) } else { sprintf("While _Fligner-Kileen test_ %s the hypotheses of homoscedascity, _Bartlett's test_ %s it.", ifelse(hp[1] < .05, "rejected", "confirmed"), ifelse(hp[2] < .05, "rejected", "confirmed")) } %>
## Diagnostic Plots
Here you can see several diagnostic plots for ANOVA model:
- residuals against fitted values
- scale-location plot of square root of residuals against fitted values
- normal Q-Q plot
- residuals against leverages
<%=
par(mfrow = c(2, 2))
+plot(fit)
%>
# ANOVA Summary
## ANOVA Table
<%=
a <- anova(fit)
a.f <- a$F
a.p <- a$Pr
a.fp <- a.p < .05
data.frame(a)
%>
_F-test_ for <%= p(fac.label[1]) %> is <%= ifelse(a.fp[1], "", "not") %> statistically significant, which implies that there is <%= ifelse(a.fp[1], "an", "no") %> <%= fac.label[1] %> effect on response variable. <%= if (fac.ilen == 2) { sprintf("Effect of %s on response variable is %s significant. ", p(fac.label[2]), ifelse(a.fp[2], "", "not")) } else { "" } %><%= if (fac.ilen == 2 & fac.intr) { sprintf("Interaction between levels of %s %s found significant (p = %.3f).", p(fac.label), ifelse(a.fp[3], "was", "wasn't"), a.p[3]) } else { "" } %>
## Post Hoc test
### Results
After getting the results of the ANOVA, usually it is advisable to run a [post hoc test](http://en.wikipedia.org/wiki/Post-hoc_analysis) to explore patterns that were not specified a priori. Now we are presenting [Tukey's HSD test](http://en.wikipedia.org/wiki/Tukey%27s_range_test).
<%=
aovfit <- aov(fit)
Tukey <- TukeyHSD(aovfit)
%>
<% for (v in names(Tukey)) { %>
#### <%= v %>
<%= posthoc <- round(Tukey[[v]],3)
colnames(posthoc) <- c("Difference", "Lower Bound", "Upper Bound", "P value")
is.signif <- length(posthoc[,4][which(abs(posthoc[,4]) < 0.05)]) > 0
length.signif <- length(posthoc[,4][which(abs(posthoc[,4]) < 0.05)])
if (is.signif) {
post.signif <- paste(pander.return(lapply(1:length.signif, function(i) paste0(p(c(rownames(posthoc)[which(abs(posthoc[,4]) < 0.05)][i])), ' (', round(posthoc[,4][which(abs(posthoc[,4]) < 0.05)][i], 3), ')'))), collapse = '\n')
} else {
post.signif <- NULL
}
posthoc[,4] <- add.significance.stars(posthoc[,4])
posthoc
%>
<% if (is.signif) { %>
The following categories differ significantly (in the brackets you can see the p-value):
<% } else { %>
There are no categories which differ significantly here.
<% } %>
<%=
post.signif
%>
<% } %>
### Plot
Below you can see the result of the post hoc test on a plot.
<%= Tukey_plot <- plot(TukeyHSD(aovfit)) %>