add fun_avg to ppc_avg functions#349
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #349 +/- ##
==========================================
- Coverage 98.66% 98.60% -0.06%
==========================================
Files 35 35
Lines 5600 5650 +50
==========================================
+ Hits 5525 5571 +46
- Misses 75 79 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
TeemuSailynoja
left a comment
There was a problem hiding this comment.
Thank you for the fast reaction. This looks like a clean implementation to address the listed issue. My only comment is about the documentation, which could mention the expected format of fun_avg.
R/ppc-errors.R
Outdated
| #' @param fun_avg Function to apply to compute the posterior average. | ||
| #' Defaults to `"mean"`. |
There was a problem hiding this comment.
In the ppc_stat functions, we have a similar argument, by the name stat, which does a very similar job:
#' @param stat A single function or a string naming a function, except for the
#' 2D plot which requires a vector of exactly two names or functions. In all
#' cases the function(s) should take a vector input and return a scalar
#' statistic. If specified as a string (or strings) then the legend will
#' display the function name(s). If specified as a function (or functions)
#' then generic naming is used in the legend.
We could align this doc to read, for example:
#' @param fun_avg A function or a string naming a function for computing the
#' posterior average. In both cases, the function should take a vector input and
#' return a scalar statistic. If specified as a string, then the legend will
#' display the function name. If specified as a function
#' then generic naming is used in the legend.
#' Defaults to"mean".
There was a problem hiding this comment.
The Average y - y_rep axis label is not affected. I didn't want to make yrep_avg_label() and error_avg_label() depend on fun_avg or change the default "Average y - y_rep" labels.
It does affect the $rep_label in ppc_scatter_avg_data(y, yrep) when fun_avg is a string.
There was a problem hiding this comment.
wait, i'm just noticing Aki's comment. I'll switch to stat.
There was a problem hiding this comment.
Thanks @tjmahr. The code looks good. And I agree with changing to stat
R/ppc-scatterplots.R
Outdated
| #' @param fun_avg Function to apply to compute the posterior average. | ||
| #' Defaults to `"mean"`. |
|
I was also thinking It would be good to change the y-axis label to show the |
|
Following up on @avehtari , also would be good to change the x-axis label to show the |
Because the function name is used for both labeling and invoking a function name, I added library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#> * Does _not_ affect other ggplot2 plots
#> * See ?bayesplot_theme_set for details on theme setting
l <- jsonlite::read_json(
"https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json",
simplifyVector = TRUE
)
y <- l$y
yrep <- l$yrep
x <- l$xstat is inlined into the axis label ppc_scatter_avg(y, yrep, stat = "mean")stat can be a function ppc_scatter_avg(y, yrep, stat = median)or a symbol ppc_scatter_avg(y, yrep, stat = quote(median))stat can be a primitive function ppc_error_scatter_avg_vs_x(y, yrep, x, stat = min)stat can be an anonymous function, but it gets a generic label ppc_scatter_avg(y, yrep, stat = function(x) quantile(x, .1))Created on 2025-05-14 with reprex v2.1.1 |
If I'm not wrong, this is more challenging, as when passing |
No, you can capture that. ( library(rlang)
library(ggplot2)
f <- function(x, y) {
qx <- enquo(x)
qy <- enquo(y)
data <- data.frame(x = x, y = y)
ggplot(data) +
aes(x, y) +
geom_point() +
labs(
x = as_label(quo_get_expr(qx)),
y = as_label(quo_get_expr(qy))
)
}
data <- data.frame(
y = bayesplot::example_y_data(),
x = bayesplot::example_x_data()
)
f(data$x, data$y)Created on 2025-05-15 with reprex v2.1.1 The hard part for us is that when you call My update yesterday kept track of the expression for Edit: I figured out how to handle the inner function problem by using quosures/tunnelling. |
library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#> * Does _not_ affect other ggplot2 plots
#> * See ?bayesplot_theme_set for details on theme setting
l <- jsonlite::read_json(
"https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json",
simplifyVector = TRUE
)
y <- l$y
yrep <- l$yrep
x <- l$xFunction name ppc_scatter_avg(y, yrep, stat = "mean")ppc_error_scatter_avg(y, yrep, stat = "mean")Function object ppc_scatter_avg(y, yrep, stat = median)ppc_error_scatter_avg(y, yrep, stat = median)Primitive function ppc_scatter_avg(y, yrep, x, stat = min)ppc_error_scatter_avg_vs_x(y, yrep, x, stat = min)Anonymous function ppc_scatter_avg(y, yrep, x, stat = function(x) quantile(x, .1))ppc_error_scatter_avg_vs_x(y, yrep, x, stat = function(x) quantile(x, .1))Anonymous function (formulas) ppc_scatter_avg(y, yrep, x, stat = ~ quantile(.x, .1))ppc_error_scatter_avg_vs_x(y, yrep, x, stat = ~ quantile(.x, .1))Created on 2025-05-15 with reprex v2.1.1 |
- avoid global for y, italic
|
This patch should be ready for review. The expression for library(bayesplot)
#> This is bayesplot version 1.12.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#> * Does _not_ affect other ggplot2 plots
#> * See ?bayesplot_theme_set for details on theme setting
l <- jsonlite::read_json(
"https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json",
simplifyVector = TRUE
)
ppc_error_scatter_avg_vs_x(l$y, l$yrep, l$x, stat = "sd")Created on 2025-05-16 with reprex v2.1.1 |
stat(y - y_rep) is what is computed internally. |
|
@tjmahr sorry for the delay in reviewing this. Been a busy few days. Will try to get to it soon! |
jgabry
left a comment
There was a problem hiding this comment.
This looks good! I made a few minor comments/questions. Aside from those my only other question is does it makes sense to also use this new approach with the stat argument to ppc_stat()? You could do something similar to what you've done with the axis labels here but for the legend for ppc_stat(). What do you think?
| SystemRequirements: pandoc (>= 1.12.3), pandoc-citeproc | ||
| Depends: | ||
| R (>= 3.1.0) | ||
| R (>= 4.1.0) |
There was a problem hiding this comment.
I think at this point it's been out long enough that bumping the required R version is fine.
|
@tjmahr Also see @avehtari's comment here #350 (comment), which supports @kruschke's suggestion. When we were always using the mean it didn't matter if we were computing Then there's the separate question of what to plot on x and y axes for |
My thinking was stat(y - y_rep) is an average error, and y - stat(y_rep) is the error of the average prediction, so the former better matches the function names and how I think about ppc functions. I don't oppose the change, and updating the axis label will make it unambiguous what's being computed. |
|
Yeah I have mixed feelings about this. Both feel intuitive to me but represent slightly different things. A couple of options (there are probably others):
Do you have a preference? Or a 3rd option? |
|
I think it would be better to make a new function, to not break anyone's current plots. I was also thinking of name something like |
Okay, that seems to be the plan now in #348. |
Merge remote-tracking branch 'origin/master' into fun-avg # Conflicts: # NEWS.md
|
Here's the hopefully final set of test plots for the library(bayesplot)
#> This is bayesplot version 1.13.0.9000
#> - Online documentation and vignettes at mc-stan.org/bayesplot
#> - bayesplot theme set to bayesplot::theme_default()
#> * Does _not_ affect other ggplot2 plots
#> * See ?bayesplot_theme_set for details on theme setting
l <- jsonlite::read_json(
"https://gist.githubusercontent.com/tjmahr/3eba4f51e4b122b19417718b56423a6c/raw/31711cacef59eff69ed95eb51e2ae844eaad01b7/draws.json",
simplifyVector = TRUE
)
y <- l$y
yrep <- l$yrep
x <- l$xstat is inlined into the axis label ppc_scatter_avg(y, yrep, stat = "mean")stat can be a function ppc_scatter_avg(y, yrep, stat = median)stat can be a primitive function ppc_error_scatter_avg_vs_x(y, yrep, x, stat = min)stat can be an anonymous function, but it gets a generic label ppc_scatter_avg(y, yrep, stat = function(x) quantile(x, .1))ppc_scatter_avg(y, yrep, stat = ~quantile(.x, .1))Created on 2025-07-02 with reprex v2.1.1 |
|
This should be ready to merge. |
TeemuSailynoja
left a comment
There was a problem hiding this comment.
This looks very good! Thank you, @tjmahr !






















Following #348, allow user to set the averaging function so that, e.g., user can choose median for heavy-tailed distributions.
Created on 2025-05-13 with reprex v2.1.1