You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Turning empty exercises into the multiple choice questions that were intended.
* Putting data needed for exercises into the setup chunk.
* Various bug fixes in tutorial 6 lesson 05. Most related to data not being accessible to exercises or multiple choice questions not being implemented properly.
# Get prop. cases where abs. permuted slope is greater than or equal to abs. observed slope
707
716
p_value = mean(abs_perm_slope >= abs_obs_slope)
708
717
)
718
+
709
719
```
710
720
711
721
@@ -714,23 +724,31 @@ perm_slope |>
714
724
### Inference on slope
715
725
716
726
717
-
#### What can we conclude based on the p-value associated with the twins data?
718
727
719
728
720
-
```{r ex27, exercise=TRUE}
721
729
730
+
731
+
```{r option-ex27, echo=FALSE}
732
+
question("What can we conclude based on the p-value associated with the twins data?",
733
+
answer("If there were no association between foster and biological twin IQ (no nature) in the population, we would be extremely unlikely to have collected a sample of data like we did. ", correct=TRUE, message="Right! Remember that here the p-value is measuring the probability of estimated slope being as large as what we observed if the population slope really is 0."),
734
+
answer("A biological twin's IQ being higher causes a foster twin's IQ to be higher.", message="No. Remember that correlation does not prove causation"),
735
+
answer("Biological twins' IQs are higher than foster twins' IQs, on average.", message="No. Remember that we are looking at whether the IQs tend to vary in the same direction."),
736
+
answer("Given the data, the probability of biological and foster twins' IQs being unrelated is close to zero.", message="No. Remember that we are looking at the probability of these data given the null hypothesis being true, not the probability of the hypothesis given the data!"),
737
+
allow_retry = TRUE
738
+
)
722
739
```
723
740
741
+
724
742
```{r ex27-hint}
725
743
Rememeber that a p-value is the probability of seeing data this (or more) extreme, given the null hypothesis is true.
726
744
```
727
745
728
746
729
747
```{r ex27-solution}
730
-
- If there were no association between foster and biological twin IQ (no nature) in the population, we would be extremely unlikely to have collected a sample of data like we did.
731
-
- A biological twin's IQ being higher causes a foster twin's IQ to be higher.
732
-
- Biological twins' IQs are higher than foster twins' IQs, on average.
733
-
- Given the data, the probability of biological and foster twins' IQs being unrelated is close to zero.
748
+
-
749
+
-
750
+
-
751
+
-
734
752
```
735
753
736
754
@@ -1025,26 +1043,20 @@ boot_slope |>
1025
1043
1026
1044
### Inference from randomization and bootstrapped distributions
1027
1045
1028
-
Throughout this lesson we have investigated the slope associated with the regression of `Foster` twins on `Biological` twins. The inference question was based on a randomization test assuming no relationship between the two types of twins (i.e., a slope of zero). The confidence intervals investigated a research question associated with a 100% nature relationship (i.e., a slope of one). What are the appropriate conclusions of this study?
1029
-
1030
-
1031
-
```{r ex211, exercise=TRUE}
1032
-
1033
-
```
1046
+
Throughout this lesson we have investigated the slope associated with the regression of `Foster` twins on `Biological` twins. The inference question was based on a randomization test assuming no relationship between the two types of twins (i.e., a slope of zero). The confidence intervals investigated a research question associated with a 100% nature relationship (i.e., a slope of one).
1034
1047
1035
1048
1036
-
```{r ex211-hint}
1037
-
Remember the difference between the population slope and the estimated slope. What do the inferential procedures you've performed say about each one?
1049
+
```{r option-ex211, echo=FALSE}
1050
+
question("What are the appropriate conclusions of this study?",
1051
+
answer("Zero is not a plausible value for the population slope, one is a plausible value for the population slope.", correct=TRUE, message="Right! Remember that any value in the confidence interval is a 'plausible value' for the population slope. Any value outside of the confidence interval is 'not plausible'."),
1052
+
answer("Zero is not a plausible value for the estimated slope, one is a plausible value for the estimated slope.", message="No. Remember that we know the exact value of the 'estimated slope'. The challenge lies in using this estimated slope to discover 'plausible values' for the population slope!"),
1053
+
answer("Zero is a plausible value for the population slope, one is not a plausible value for the population slope.", message="No. Remember that any value in the confidence interval is a 'plausible value' for the population slope. Any value outside of the confidence interval is 'not plausible'."),
1054
+
answer("Zero is a plausible value for the estimated slope, one not is a plausible value for the estimated slope.", message="No. Remember that we know the exact value of the 'estimated slope'. The challenge lies in using this estimated slope to discover 'plausible values' for the population slope!"),
1055
+
allow_retry = TRUE
1056
+
)
1038
1057
```
1039
1058
1040
1059
1041
-
```{r ex211-solution}
1042
-
- Zero is not a plausible value for the population slope, one is a plausible value for the population slope.
1043
-
- Zero is not a plausible value for the estimated slope, one is a plausible value for the estimated slope.
1044
-
- Zero is a plausible value for the population slope, one is not a plausible value for the population slope.
1045
-
- Zero is a plausible value for the estimated slope, one not is a plausible value for the estimated slope.
# Should ideally be loaded from the imstutorials package when it exists
87
145
is_server_context <- function(.envir) {
@@ -319,17 +377,6 @@ question("Why does it matter if the sampling distribution is accurate?",
319
377
Using the permuted datasets (recall, the randomization forces the null hypothesis to be true), investigate the distribution of the standardized slope statistics (the slope, which has been divided by the standard error). Note that the distribution of the standardized slope is well described by a t-distribution.
Copy file name to clipboardExpand all lines: 06-model-infer/05-lesson/06-05-lesson.Rmd
+29-29Lines changed: 29 additions & 29 deletions
Original file line number
Diff line number
Diff line change
@@ -29,6 +29,12 @@ change <- change |>
29
29
mutate(Amount = Dollars) |>
30
30
dplyr::select(-Dollars)
31
31
32
+
33
+
LAhomes <- LAhomes |>
34
+
filter(bed > 0)
35
+
36
+
37
+
32
38
# Hash generation helpers
33
39
# Should ideally be loaded from the imstutorials package when it exists
34
40
is_server_context <- function(.envir) {
@@ -260,26 +266,22 @@ You will need to run the linear model before answering the question:
260
266
261
267
```
262
268
263
-
Consider data collected by Andrew Bray at Reed College on characteristics of LA Homes in 2010. The model is given below, and your task is to provide the appropriate interpretation of the coefficient on `log(sqft)`?
269
+
Consider data collected by Andrew Bray at Reed College on characteristics of LA Homes in 2010. The model is given below, and your task is to provide the appropriate interpretation of the coefficient on `log(sqft)`.
264
270
265
271
Note: you must be careful to avoid causative interpretations. Additional square footage does not necessarily cause the price of a specific house to go up. The interpretation of the coefficient describes the estimate of the average price of homes at a given square footage.
266
272
267
273
268
-
```{r ex, exercise=TRUE}
269
-
270
-
```
271
-
272
-
```{r ex-hint}
273
-
Remember that a log-log model has a special kind of interpretation.
274
+
```{r option-ex, echo=FALSE}
275
+
question("What is the appropriate interpretation of the coefficient on `log(sqft)`?",
276
+
answer("Each additional square foot of house size produces an estimate of the average price which is $1.44 more.", message="No, the interpretation of a log-log model is different than a linear model."),
277
+
answer("Each additional square foot of house size produces an estimate of the average price which is $1,442 more.", message="No. Try again."),
278
+
answer("Each additional square foot of house size produces an estimate of the average price which is 1.44% higher.", message="No, remember that both Y and X have been log transformed."),
279
+
answer("Each additional 1% of square footage produces an estimate of the average price which is $1.44 more.", message="No, remember that both Y and X have been log transformed."),
280
+
answer("Each additional 1% of square footage produces an estimate of the average price which is 1.44% higher.", correct = TRUE, message="Right! After the log-log transformation, our slope estimates the percent change in Y for each 1% change in X"),
281
+
allow_retry = TRUE
282
+
)
274
283
```
275
284
276
-
```{r ex-solution}
277
-
- Each additional square foot of house size produces an estimate of the average price which is $1.44 more.
278
-
- Each additional square foot of house size produces an estimate of the average price which is $1,442 more.
279
-
- Each additional square foot of house size produces an estimate of the average price which is 1.44% higher.
280
-
- Each additional 1% of square footage produces an estimate of the average price which is $1.44 more.
281
-
- Each additional 1% of square footage produces an estimate of the average price which is 1.44% higher.
282
-
```
283
285
284
286
285
287
@@ -630,27 +632,25 @@ lm(Price ~ Service + Food + Decor, data = restNYC) |> tidy()
630
632
### Interpreting coefficients
631
633
632
634
633
-
What is the correct interpretation of the coefficient on `Service` in the linear model which regresses `Price` on `Service`, `Food`, and `Decor`?
634
-
635
635
636
636
You will need to run the linear model before answering the question:
637
637
`lm(Price ~ Service + Food + Decor, data=restNYC) |> tidy()`
638
638
639
639
640
-
```{r ex56, exercise=TRUE}
641
-
642
-
```
643
640
644
-
```{r ex56-hint}
645
-
You have to consider the interpretation of a single variable while holding the other variables constant.
646
-
```
647
-
648
-
649
-
```{r ex56-solution}
650
-
- For every one unit increase in `Service`, the predicted average `Price` is expected to increase by 0.135.
651
-
- For every one unit increase in `Service`, the predicted average `Price` is expected to increase by 0.135, given fixed values of `Food` and `Decor`.
652
-
- For every one unit increase in `Service`, the predicted average `Price` is expected to increase by 0.135, for any possible value of `Food` and `Decor`.
653
-
- Given that `Food` and `Decor` are in the model, `Service` is not significant, and we cannot know whether it has effect on modeling `Price`.
641
+
```{r ex56, exercise=TRUE}
642
+
lm(Price ~ Service + Food + Decor, data=restNYC) |> tidy()
643
+
```
644
+
645
+
```{r option-ex56, echo=FALSE}
646
+
question("What is the correct interpretation of the coefficient on `Service` in the linear model which regresses `Price` on `Service`, `Food`, and `Decor`?
647
+
?",
648
+
answer("For every one unit increase in `Service`, the predicted average `Price` is expected to increase by 0.135.", message="No. You have to consider the interpretation of a single variable while holding the other variables constant."),
649
+
answer("For every one unit increase in `Service`, the predicted average `Price` is expected to increase by 0.135, given fixed values of `Food` and `Decor`.", message="This interpretation of the point estimate is correct, but consider the large amount of uncertainty in this estimate. There is a better answer."),
650
+
answer("For every one unit increase in `Service`, the predicted average `Price` is expected to increase by 0.135, for any possible value of `Food` and `Decor`.", message="No. You have to consider the interpretation of a single variable while holding the other variables constant. Remember that `Food`, `Decor`, and `Service` are all positively correlated: if you find a restaurant with better `Service`, it also likely has better `Food` and better `Decor`, so a fair comparison requires holding those other variables constant."),
651
+
answer("Given that `Food` and `Decor` are in the model, `Service` is not significant, and we cannot know whether it has effect on modeling `Price`", correct=TRUE, message="Right!"),
0 commit comments