initial checkin + hops proportions example

eipi10 · Aug 17, 2018 · a32b100 · a32b100
commit a32b100
Show file tree

Hide file tree

Showing 8 changed files with 344 additions and 0 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,2 @@
+* text=auto
+
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,4 @@
+*~
+.Rhistory
+.RData
+*.Rproj*
diff --git a/hops_proportions.Rmd b/hops_proportions.Rmd
@@ -0,0 +1,116 @@
+---
+title: "Hypothetical Outcome Plots (HOPs) for proportions"
+output: github_document
+bibliography: references.bib
+link-citations: true
+---
+
+Here is a quick example of hypothetical outcome plots (HOPs) for proportions
+
+## Setup
+
+The following libraries are needed:
+
+```{r setup, message = FALSE, warning = FALSE}
+library(tidyverse)
+library(modelr)
+library(rstanarm)
+library(tidybayes)
+library(gganimate)     # devtools::install_github("thomasp85/gganimate")
+library(ggstance)
+library(forcats)
+
+theme_set(
+  theme_light() +
+  theme(panel.grid = element_blank())
+)
+```
+
+## Data
+
+We'll use some count data to illustrate:
+
+```{r data}
+df = data_frame(
+  group = c("A","B","C"),
+  count = c(100, 50, 30),
+  proportion = count / sum(count)
+)
+df
+```
+
+## Model
+
+One *possible* model for this data is as follows (N.B. this makes some strong assumptions about how the data were generated for the purposes of an example, your mileage may vary!):
+
+```{r model, results = "hide"}
+m = stan_glm(count ~ group, family = poisson, data = df)
+```
+
+## HOPs
+
+Given that model, we could construct a HOPs [@Hullman2015; @Kale2018] bar chart illustrating the posterior distribution for the proportion in each group:
+
+```{r hops, fig.width = 7, fig.height = 3, cache = TRUE}
+n_hops = 100
+
+p = df %>%
+  data_grid(group) %>%
+  add_fitted_draws(m, n = n_hops) %>%
+  group_by(.draw) %>%
+  mutate(proportion = .value / sum(.value)) %>%
+  ggplot(aes(y = group, x = proportion)) +
+  geom_colh(fill = "gray75") +
+  geom_point(data = df, color = "red") +
+  annotate("text", y = "C", x = .5, label = "Observed proportion", hjust = 0, color = "red") +
+  annotate("segment", y = "C", yend = "C", x = .18, xend = .49, linetype = "dashed", color = "red") +
+  annotate("text", y = 2.7, x = .5, label = "Posterior distribution for proportion", hjust = 0,
+    color = "gray35") +
+  annotate("segment", y = 2.7, yend = 2.7, x = .1, xend = .49, linetype = "dashed", color = "gray75") +
+  xlim(0,1) +
+  transition_states(.draw, transition_length = 1, state_length = 1)
+
+animate(p, nframes = n_hops * 2, width = 600, height = 300)
+```
+
+## Quantile dotplots (static alternative)
+
+If animation were not available (e.g. in a print medium), an alternative might be a quantile dotplot [@Kay2016; @Fernandes2018]:
+
+```{r quantile-dotplots, fig.width = 7, fig.height = 4}
+observed_label_data = data_frame(
+  group = "C",
+  label = "Observed proportion",
+  x = .18, xend = .44, y = .55
+)
+dotplot_label_data = data_frame(
+  group = "C",
+  label = "100 approximately equally likely proportions",
+  x = .18, xend = .44, y = .3
+)
+
+
+df %>%
+  data_grid(group) %>%
+  add_fitted_draws(m) %>%
+  group_by(.draw) %>%
+  mutate(proportion = .value / sum(.value)) %>%
+  group_by(group) %>%
+  do(data_frame(proportion = quantile(.$proportion, ppoints(100)))) %>%
+  ggplot(aes(x = proportion)) +
+  geom_dotplot(binwidth = .01, fill = "gray65", color = NA) +
+  facet_grid(fct_rev(group) ~ .) +
+  geom_text(aes(xend + .01, y, label = label), data = observed_label_data, hjust = 0, color = "red") +
+  geom_segment(aes(x, y, xend = xend, yend = y), data = observed_label_data,
+    linetype = "dashed", color = "red") +
+  geom_vline(aes(xintercept = proportion), data = df, color = "red") +
+  geom_text(aes(xend + .01, y, label = label), data = dotplot_label_data, hjust = 0, color = "gray35") +
+  geom_segment(aes(x, y, xend = xend, yend = y), data = dotplot_label_data,
+    linetype = "dashed", color = "gray65") +
+  xlim(0,1) +
+  ylab(NULL) +
+  scale_y_continuous(breaks = NULL)
+```
+
+
+## References
diff --git a/hops_proportions.md b/hops_proportions.md
@@ -0,0 +1,170 @@
+Hypothetical Outcome Plots (HOPs) for proportions
+================
+
+Here is a quick example of hypothetical outcome plots (HOPs) for
+proportions
+
+## Setup
+
+The following libraries are needed:
+
+``` r
+library(tidyverse)
+library(modelr)
+library(rstanarm)
+library(tidybayes)
+library(gganimate)     # devtools::install_github("thomasp85/gganimate")
+library(ggstance)
+library(forcats)
+
+theme_set(
+  theme_light() +
+  theme(panel.grid = element_blank())
+)
+```
+
+## Data
+
+We’ll use some count data to illustrate:
+
+``` r
+df = data_frame(
+  group = c("A","B","C"),
+  count = c(100, 50, 30),
+  proportion = count / sum(count)
+)
+df
+```
+
+    ## # A tibble: 3 x 3
+    ##   group count proportion
+    ##   <chr> <dbl>      <dbl>
+    ## 1 A       100      0.556
+    ## 2 B        50      0.278
+    ## 3 C        30      0.167
+
+## Model
+
+One *possible* model for this data is as follows (N.B. this makes some
+strong assumptions about how the data were generated for the purposes of
+an example, your mileage may vary\!):
+
+``` r
+m = stan_glm(count ~ group, family = poisson, data = df)
+```
+
+## HOPs
+
+Given that model, we could construct a HOPs (Hullman, Resnick, and Adar
+[2015](#ref-Hullman2015); Kale et al. [2019](#ref-Kale2018)) bar chart
+illustrating the posterior distribution for the proportion in each
+group:
+
+``` r
+n_hops = 100
+
+p = df %>%
+  data_grid(group) %>%
+  add_fitted_draws(m, n = n_hops) %>%
+  group_by(.draw) %>%
+  mutate(proportion = .value / sum(.value)) %>%
+  ggplot(aes(y = group, x = proportion)) +
+  geom_colh(fill = "gray75") +
+  geom_point(data = df, color = "red") +
+  annotate("text", y = "C", x = .5, label = "Observed proportion", hjust = 0, color = "red") +
+  annotate("segment", y = "C", yend = "C", x = .18, xend = .49, linetype = "dashed", color = "red") +
+  annotate("text", y = 2.7, x = .5, label = "Posterior distribution for proportion", hjust = 0,
+    color = "gray35") +
+  annotate("segment", y = 2.7, yend = 2.7, x = .1, xend = .49, linetype = "dashed", color = "gray75") +
+  xlim(0,1) +
+  transition_states(.draw, transition_length = 1, state_length = 1)
+
+animate(p, nframes = n_hops * 2, width = 600, height = 300)
+```
+
+![](hops_proportions_files/figure-gfm/hops-1.gif)<!-- -->
+
+## Quantile dotplots (static alternative)
+
+If animation were not available (e.g. in a print medium), an alternative
+might be a quantile dotplot (Kay et al. [2016](#ref-Kay2016); Fernandes
+et al. [2018](#ref-Fernandes2018)):
+
+``` r
+observed_label_data = data_frame(
+  group = "C",
+  label = "Observed proportion",
+  x = .18, xend = .44, y = .55
+)
+dotplot_label_data = data_frame(
+  group = "C",
+  label = "100 approximately equally likely proportions",
+  x = .18, xend = .44, y = .3
+)
+
+
+df %>%
+  data_grid(group) %>%
+  add_fitted_draws(m) %>%
+  group_by(.draw) %>%
+  mutate(proportion = .value / sum(.value)) %>%
+  group_by(group) %>%
+  do(data_frame(proportion = quantile(.$proportion, ppoints(100)))) %>%
+  ggplot(aes(x = proportion)) +
+  geom_dotplot(binwidth = .01, fill = "gray65", color = NA) +
+  facet_grid(fct_rev(group) ~ .) +
+  geom_text(aes(xend + .01, y, label = label), data = observed_label_data, hjust = 0, color = "red") +
+  geom_segment(aes(x, y, xend = xend, yend = y), data = observed_label_data,
+    linetype = "dashed", color = "red") +
+  geom_vline(aes(xintercept = proportion), data = df, color = "red") +
+  geom_text(aes(xend + .01, y, label = label), data = dotplot_label_data, hjust = 0, color = "gray35") +
+  geom_segment(aes(x, y, xend = xend, yend = y), data = dotplot_label_data,
+    linetype = "dashed", color = "gray65") +
+  xlim(0,1) +
+  ylab(NULL) +
+  scale_y_continuous(breaks = NULL)
+```
+
+![](hops_proportions_files/figure-gfm/quantile-dotplots-1.png)<!-- -->
+
+## References
+
+<div id="refs" class="references">
+
+<div id="ref-Fernandes2018">
+
+Fernandes, Michael, Logan Walls, Sean Munson, Jessica Hullman, and
+Matthew Kay. 2018. “Uncertainty Displays Using Quantile Dotplots or CDFs
+Improve Transit Decision-Making.” *Conference on Human Factors in
+Computing Systems - CHI ’18*. <https://doi.org/10.1145/3173574.3173718>.
+
+</div>
+
+<div id="ref-Hullman2015">
+
+Hullman, Jessica, Paul Resnick, and Eytan Adar. 2015. “Hypothetical
+Outcome Plots Outperform Error Bars and Violin Plots for Inferences
+about Reliability of Variable Ordering.” *PloS One* 10 (11). Public
+Library of Science. <https://doi.org/10.1371/journal.pone.0142444>.
+
+</div>
+
+<div id="ref-Kale2018">
+
+Kale, Alex, Francis Nguyen, Matthew Kay, and Jessica Hullman. 2019.
+“Hypothetical Outcome Plots Help Untrained Observers Judge Trends in
+Ambiguous Data.” *Transactions on Visualization and Computer Graphics*.
+
+</div>
+
+<div id="ref-Kay2016">
+
+Kay, Matthew, Tara Kola, Jessica R Hullman, and Sean A Munson. 2016.
+“When (ish) is My Bus? User-centered Visualizations of Uncertainty in
+Everyday, Mobile Predictive Systems.” *Proceedings of the 2016 CHI
+Conference on Human Factors in Computing Systems - CHI ’16*, 5092–5103.
+<https://doi.org/10.1145/2858036.2858558>.
+
+</div>
+
+</div>
diff --git a/hops_proportions_files/figure-gfm/hops-1.gif b/hops_proportions_files/figure-gfm/hops-1.gif
diff --git a/hops_proportions_files/figure-gfm/quantile-dotplots-1.png b/hops_proportions_files/figure-gfm/quantile-dotplots-1.png
diff --git a/hops_proportions_files/figure-gfm/unnamed-chunk-4-1.png b/hops_proportions_files/figure-gfm/unnamed-chunk-4-1.png
diff --git a/references.bib b/references.bib
@@ -0,0 +1,52 @@
+@article{Hullman2015,
+abstract = {Many visual depictions of probability distributions, such as error bars, are difficult for users to accurately interpret. We present and study an alternative representation, Hypothetical Outcome Plots (HOPs), that animates a finite set of individual draws. In contrast to the statistical background required to interpret many static representations of distributions, HOPs require relatively little background knowledge to interpret. Instead, HOPs enables viewers to infer properties of the distribution using mental processes like counting and integration. We conducted an experiment comparing HOPs to error bars and violin plots. With HOPs, people made much more accurate judgments about plots of two and three quantities. Accuracy was similar with all three representations for most questions about distributions of a single quantity.},
+author = {Hullman, Jessica and Resnick, Paul and Adar, Eytan},
+doi = {10.1371/journal.pone.0142444},
+file = {::},
+issn = {1932-6203},
+journal = {PloS one},
+mendeley-groups = {uncertainty/visualizing},
+month = {jan},
+number = {11},
+pmid = {26571487},
+publisher = {Public Library of Science},
+title = {{Hypothetical Outcome Plots Outperform Error Bars and Violin Plots for Inferences about Reliability of Variable Ordering.}},
+url = {http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0142444},
+volume = {10},
+year = {2015}
+}
+
+@article{Kale2018,
+  title={Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data},
+  author={Kale, Alex and Nguyen, Francis and Kay, Matthew and Hullman, Jessica},
+  journal={Transactions on Visualization and Computer Graphics},
+  year={2019}
+}
+
+@article{Kay2016,
+abstract = {Users often rely on realtime predictions in everyday contexts like riding the bus, but may not grasp that such predictions are subject to uncertainty. Existing uncertainty visualizations may not align with user needs or how they naturally reason about probability. We present a novel mobile interface design and visualization of uncertainty for transit predictions on mobile phones based on discrete outcomes. To develop it, we identified domain specific design requirements for visualizing uncertainty in transit prediction through: 1) a literature review, 2) a large survey of users of a popular realtime transit application, and 3) an iterative design process. We present several candidate visualizations of uncertainty for realtime transit predictions in a mobile context, and we propose a novel discrete representation of continuous outcomes designed for small screens, quantile dotplots. In a controlled experiment we find that quantile dotplots reduce the variance of probabilistic estimates by {\~{}}1.15 times compared to density plots and facilitate more confident estimation by end-users in the context of realtime transit prediction scenarios.},
+author = {Kay, Matthew and Kola, Tara and Hullman, Jessica R and Munson, Sean A},
+doi = {10.1145/2858036.2858558},
+file = {::},
+isbn = {9781450333627},
+journal = {Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems - CHI '16},
+keywords = {dotplots,end-user visualization,mobile interfac-es,transit predictions,uncertainty visualization},
+mendeley-groups = {uncertainty/visualizing,uncertainty/transportation},
+pages = {5092--5103},
+title = {{When (ish) is My Bus? User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems}},
+url = {http://www.mjskay.com/papers/chi{\_}2016{\_}uncertain{\_}bus.pdf},
+year = {2016}
+}
+
+@article{Fernandes2018,
+abstract = {Everyday predictive systems typically present point predic-tions, making it hard for people to account for uncertainty when making decisions. Evaluations of uncertainty displays for transit prediction have assessed people's ability to extract probabilities, but not the quality of their decisions. In a con-trolled, incentivized experiment, we had subjects decide when to catch a bus using displays with textual uncertainty, uncer-tainty visualizations, or no-uncertainty (control). Frequency-based visualizations previously shown to allow people to bet-ter extract probabilities (quantile dotplots) yielded better deci-sions. Decisions with quantile dotplots with 50 outcomes were (1) better on average, having expected payoffs 97{\%} of optimal (95{\%} CI: [95{\%},98{\%}]), 5 percentage points more than con-trol (95{\%} CI: [2,8]); and (2) more consistent, having within-subject standard deviation of 3 percentage points (95{\%} CI: [2,4]), 4 percentage points less than control (95{\%} CI: [2,6]). Cumulative distribution function plots performed nearly as well, and both outperformed textual uncertainty, which was sensitive to the probability interval communicated. We dis-cuss implications for realtime transit predictions and possible generalization to other domains.},
+author = {Fernandes, Michael and Walls, Logan and Munson, Sean and Hullman, Jessica and Kay, Matthew},
+doi = {10.1145/3173574.3173718},
+file = {:C$\backslash$:/cygwin/home/matth/docs/references/Fernandes et al. - 2018 - Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making.pdf:pdf},
+journal = {Conference on Human Factors in Computing Systems - CHI '18},
+keywords = {Authors' choice,by semicolons,include commas,of terms,required.,separated,within terms only},
+mendeley-groups = {uncertainty/visualizing/visualizing error,uncertainty/transportation,uncertainty/decision-making,busses,uncertainty/visualizing/quantiles},
+title = {{Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making}},
+url = {http://www.mjskay.com/papers/chi2018-uncertain-bus-decisions.pdf},
+year = {2018}
+}