Skip to content

Commit

Permalink
initial checkin + hops proportions example
Browse files Browse the repository at this point in the history
  • Loading branch information
mjskay committed Aug 17, 2018
0 parents commit a32b100
Show file tree
Hide file tree
Showing 8 changed files with 344 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
* text=auto

4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
*~
.Rhistory
.RData
*.Rproj*
116 changes: 116 additions & 0 deletions hops_proportions.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
title: "Hypothetical Outcome Plots (HOPs) for proportions"
output: github_document
bibliography: references.bib
link-citations: true
---

Here is a quick example of hypothetical outcome plots (HOPs) for proportions

## Setup

The following libraries are needed:

```{r setup, message = FALSE, warning = FALSE}
library(tidyverse)
library(modelr)
library(rstanarm)
library(tidybayes)
library(gganimate) # devtools::install_github("thomasp85/gganimate")
library(ggstance)
library(forcats)
theme_set(
theme_light() +
theme(panel.grid = element_blank())
)
```

## Data

We'll use some count data to illustrate:

```{r data}
df = data_frame(
group = c("A","B","C"),
count = c(100, 50, 30),
proportion = count / sum(count)
)
df
```

## Model

One *possible* model for this data is as follows (N.B. this makes some strong assumptions about how the data were generated for the purposes of an example, your mileage may vary!):

```{r model, results = "hide"}
m = stan_glm(count ~ group, family = poisson, data = df)
```

## HOPs

Given that model, we could construct a HOPs [@Hullman2015; @Kale2018] bar chart illustrating the posterior distribution for the proportion in each group:

```{r hops, fig.width = 7, fig.height = 3, cache = TRUE}
n_hops = 100
p = df %>%
data_grid(group) %>%
add_fitted_draws(m, n = n_hops) %>%
group_by(.draw) %>%
mutate(proportion = .value / sum(.value)) %>%
ggplot(aes(y = group, x = proportion)) +
geom_colh(fill = "gray75") +
geom_point(data = df, color = "red") +
annotate("text", y = "C", x = .5, label = "Observed proportion", hjust = 0, color = "red") +
annotate("segment", y = "C", yend = "C", x = .18, xend = .49, linetype = "dashed", color = "red") +
annotate("text", y = 2.7, x = .5, label = "Posterior distribution for proportion", hjust = 0,
color = "gray35") +
annotate("segment", y = 2.7, yend = 2.7, x = .1, xend = .49, linetype = "dashed", color = "gray75") +
xlim(0,1) +
transition_states(.draw, transition_length = 1, state_length = 1)
animate(p, nframes = n_hops * 2, width = 600, height = 300)
```

## Quantile dotplots (static alternative)

If animation were not available (e.g. in a print medium), an alternative might be a quantile dotplot [@Kay2016; @Fernandes2018]:

```{r quantile-dotplots, fig.width = 7, fig.height = 4}
observed_label_data = data_frame(
group = "C",
label = "Observed proportion",
x = .18, xend = .44, y = .55
)
dotplot_label_data = data_frame(
group = "C",
label = "100 approximately equally likely proportions",
x = .18, xend = .44, y = .3
)
df %>%
data_grid(group) %>%
add_fitted_draws(m) %>%
group_by(.draw) %>%
mutate(proportion = .value / sum(.value)) %>%
group_by(group) %>%
do(data_frame(proportion = quantile(.$proportion, ppoints(100)))) %>%
ggplot(aes(x = proportion)) +
geom_dotplot(binwidth = .01, fill = "gray65", color = NA) +
facet_grid(fct_rev(group) ~ .) +
geom_text(aes(xend + .01, y, label = label), data = observed_label_data, hjust = 0, color = "red") +
geom_segment(aes(x, y, xend = xend, yend = y), data = observed_label_data,
linetype = "dashed", color = "red") +
geom_vline(aes(xintercept = proportion), data = df, color = "red") +
geom_text(aes(xend + .01, y, label = label), data = dotplot_label_data, hjust = 0, color = "gray35") +
geom_segment(aes(x, y, xend = xend, yend = y), data = dotplot_label_data,
linetype = "dashed", color = "gray65") +
xlim(0,1) +
ylab(NULL) +
scale_y_continuous(breaks = NULL)
```


## References
170 changes: 170 additions & 0 deletions hops_proportions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
Hypothetical Outcome Plots (HOPs) for proportions
================

Here is a quick example of hypothetical outcome plots (HOPs) for
proportions

## Setup

The following libraries are needed:

``` r
library(tidyverse)
library(modelr)
library(rstanarm)
library(tidybayes)
library(gganimate) # devtools::install_github("thomasp85/gganimate")
library(ggstance)
library(forcats)

theme_set(
theme_light() +
theme(panel.grid = element_blank())
)
```

## Data

We’ll use some count data to illustrate:

``` r
df = data_frame(
group = c("A","B","C"),
count = c(100, 50, 30),
proportion = count / sum(count)
)
df
```

## # A tibble: 3 x 3
## group count proportion
## <chr> <dbl> <dbl>
## 1 A 100 0.556
## 2 B 50 0.278
## 3 C 30 0.167

## Model

One *possible* model for this data is as follows (N.B. this makes some
strong assumptions about how the data were generated for the purposes of
an example, your mileage may vary\!):

``` r
m = stan_glm(count ~ group, family = poisson, data = df)
```

## HOPs

Given that model, we could construct a HOPs (Hullman, Resnick, and Adar
[2015](#ref-Hullman2015); Kale et al. [2019](#ref-Kale2018)) bar chart
illustrating the posterior distribution for the proportion in each
group:

``` r
n_hops = 100

p = df %>%
data_grid(group) %>%
add_fitted_draws(m, n = n_hops) %>%
group_by(.draw) %>%
mutate(proportion = .value / sum(.value)) %>%
ggplot(aes(y = group, x = proportion)) +
geom_colh(fill = "gray75") +
geom_point(data = df, color = "red") +
annotate("text", y = "C", x = .5, label = "Observed proportion", hjust = 0, color = "red") +
annotate("segment", y = "C", yend = "C", x = .18, xend = .49, linetype = "dashed", color = "red") +
annotate("text", y = 2.7, x = .5, label = "Posterior distribution for proportion", hjust = 0,
color = "gray35") +
annotate("segment", y = 2.7, yend = 2.7, x = .1, xend = .49, linetype = "dashed", color = "gray75") +
xlim(0,1) +
transition_states(.draw, transition_length = 1, state_length = 1)

animate(p, nframes = n_hops * 2, width = 600, height = 300)
```

![](hops_proportions_files/figure-gfm/hops-1.gif)<!-- -->

## Quantile dotplots (static alternative)

If animation were not available (e.g. in a print medium), an alternative
might be a quantile dotplot (Kay et al. [2016](#ref-Kay2016); Fernandes
et al. [2018](#ref-Fernandes2018)):

``` r
observed_label_data = data_frame(
group = "C",
label = "Observed proportion",
x = .18, xend = .44, y = .55
)
dotplot_label_data = data_frame(
group = "C",
label = "100 approximately equally likely proportions",
x = .18, xend = .44, y = .3
)


df %>%
data_grid(group) %>%
add_fitted_draws(m) %>%
group_by(.draw) %>%
mutate(proportion = .value / sum(.value)) %>%
group_by(group) %>%
do(data_frame(proportion = quantile(.$proportion, ppoints(100)))) %>%
ggplot(aes(x = proportion)) +
geom_dotplot(binwidth = .01, fill = "gray65", color = NA) +
facet_grid(fct_rev(group) ~ .) +
geom_text(aes(xend + .01, y, label = label), data = observed_label_data, hjust = 0, color = "red") +
geom_segment(aes(x, y, xend = xend, yend = y), data = observed_label_data,
linetype = "dashed", color = "red") +
geom_vline(aes(xintercept = proportion), data = df, color = "red") +
geom_text(aes(xend + .01, y, label = label), data = dotplot_label_data, hjust = 0, color = "gray35") +
geom_segment(aes(x, y, xend = xend, yend = y), data = dotplot_label_data,
linetype = "dashed", color = "gray65") +
xlim(0,1) +
ylab(NULL) +
scale_y_continuous(breaks = NULL)
```

![](hops_proportions_files/figure-gfm/quantile-dotplots-1.png)<!-- -->

## References

<div id="refs" class="references">

<div id="ref-Fernandes2018">

Fernandes, Michael, Logan Walls, Sean Munson, Jessica Hullman, and
Matthew Kay. 2018. “Uncertainty Displays Using Quantile Dotplots or CDFs
Improve Transit Decision-Making.” *Conference on Human Factors in
Computing Systems - CHI ’18*. <https://doi.org/10.1145/3173574.3173718>.

</div>

<div id="ref-Hullman2015">

Hullman, Jessica, Paul Resnick, and Eytan Adar. 2015. “Hypothetical
Outcome Plots Outperform Error Bars and Violin Plots for Inferences
about Reliability of Variable Ordering.” *PloS One* 10 (11). Public
Library of Science. <https://doi.org/10.1371/journal.pone.0142444>.

</div>

<div id="ref-Kale2018">

Kale, Alex, Francis Nguyen, Matthew Kay, and Jessica Hullman. 2019.
“Hypothetical Outcome Plots Help Untrained Observers Judge Trends in
Ambiguous Data.” *Transactions on Visualization and Computer Graphics*.

</div>

<div id="ref-Kay2016">

Kay, Matthew, Tara Kola, Jessica R Hullman, and Sean A Munson. 2016.
“When (ish) is My Bus? User-centered Visualizations of Uncertainty in
Everyday, Mobile Predictive Systems.” *Proceedings of the 2016 CHI
Conference on Human Factors in Computing Systems - CHI ’16*, 5092–5103.
<https://doi.org/10.1145/2858036.2858558>.

</div>

</div>
Binary file added hops_proportions_files/figure-gfm/hops-1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions references.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
@article{Hullman2015,
abstract = {Many visual depictions of probability distributions, such as error bars, are difficult for users to accurately interpret. We present and study an alternative representation, Hypothetical Outcome Plots (HOPs), that animates a finite set of individual draws. In contrast to the statistical background required to interpret many static representations of distributions, HOPs require relatively little background knowledge to interpret. Instead, HOPs enables viewers to infer properties of the distribution using mental processes like counting and integration. We conducted an experiment comparing HOPs to error bars and violin plots. With HOPs, people made much more accurate judgments about plots of two and three quantities. Accuracy was similar with all three representations for most questions about distributions of a single quantity.},
author = {Hullman, Jessica and Resnick, Paul and Adar, Eytan},
doi = {10.1371/journal.pone.0142444},
file = {::},
issn = {1932-6203},
journal = {PloS one},
mendeley-groups = {uncertainty/visualizing},
month = {jan},
number = {11},
pmid = {26571487},
publisher = {Public Library of Science},
title = {{Hypothetical Outcome Plots Outperform Error Bars and Violin Plots for Inferences about Reliability of Variable Ordering.}},
url = {http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0142444},
volume = {10},
year = {2015}
}

@article{Kale2018,
title={Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data},
author={Kale, Alex and Nguyen, Francis and Kay, Matthew and Hullman, Jessica},
journal={Transactions on Visualization and Computer Graphics},
year={2019}
}

@article{Kay2016,
abstract = {Users often rely on realtime predictions in everyday contexts like riding the bus, but may not grasp that such predictions are subject to uncertainty. Existing uncertainty visualizations may not align with user needs or how they naturally reason about probability. We present a novel mobile interface design and visualization of uncertainty for transit predictions on mobile phones based on discrete outcomes. To develop it, we identified domain specific design requirements for visualizing uncertainty in transit prediction through: 1) a literature review, 2) a large survey of users of a popular realtime transit application, and 3) an iterative design process. We present several candidate visualizations of uncertainty for realtime transit predictions in a mobile context, and we propose a novel discrete representation of continuous outcomes designed for small screens, quantile dotplots. In a controlled experiment we find that quantile dotplots reduce the variance of probabilistic estimates by {\~{}}1.15 times compared to density plots and facilitate more confident estimation by end-users in the context of realtime transit prediction scenarios.},
author = {Kay, Matthew and Kola, Tara and Hullman, Jessica R and Munson, Sean A},
doi = {10.1145/2858036.2858558},
file = {::},
isbn = {9781450333627},
journal = {Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems - CHI '16},
keywords = {dotplots,end-user visualization,mobile interfac-es,transit predictions,uncertainty visualization},
mendeley-groups = {uncertainty/visualizing,uncertainty/transportation},
pages = {5092--5103},
title = {{When (ish) is My Bus? User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems}},
url = {http://www.mjskay.com/papers/chi{\_}2016{\_}uncertain{\_}bus.pdf},
year = {2016}
}

@article{Fernandes2018,
abstract = {Everyday predictive systems typically present point predic-tions, making it hard for people to account for uncertainty when making decisions. Evaluations of uncertainty displays for transit prediction have assessed people's ability to extract probabilities, but not the quality of their decisions. In a con-trolled, incentivized experiment, we had subjects decide when to catch a bus using displays with textual uncertainty, uncer-tainty visualizations, or no-uncertainty (control). Frequency-based visualizations previously shown to allow people to bet-ter extract probabilities (quantile dotplots) yielded better deci-sions. Decisions with quantile dotplots with 50 outcomes were (1) better on average, having expected payoffs 97{\%} of optimal (95{\%} CI: [95{\%},98{\%}]), 5 percentage points more than con-trol (95{\%} CI: [2,8]); and (2) more consistent, having within-subject standard deviation of 3 percentage points (95{\%} CI: [2,4]), 4 percentage points less than control (95{\%} CI: [2,6]). Cumulative distribution function plots performed nearly as well, and both outperformed textual uncertainty, which was sensitive to the probability interval communicated. We dis-cuss implications for realtime transit predictions and possible generalization to other domains.},
author = {Fernandes, Michael and Walls, Logan and Munson, Sean and Hullman, Jessica and Kay, Matthew},
doi = {10.1145/3173574.3173718},
file = {:C$\backslash$:/cygwin/home/matth/docs/references/Fernandes et al. - 2018 - Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making.pdf:pdf},
journal = {Conference on Human Factors in Computing Systems - CHI '18},
keywords = {Authors' choice,by semicolons,include commas,of terms,required.,separated,within terms only},
mendeley-groups = {uncertainty/visualizing/visualizing error,uncertainty/transportation,uncertainty/decision-making,busses,uncertainty/visualizing/quantiles},
title = {{Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making}},
url = {http://www.mjskay.com/papers/chi2018-uncertain-bus-decisions.pdf},
year = {2018}
}

0 comments on commit a32b100

Please sign in to comment.