-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Change density default bw = "nrd0"
to bw = "sj"
#5854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I'm not sure I think this is worth the breaking change to be honest... I probably miss a compelling example of when it would improve a visualisation |
I don't know enough about these bandwidth estimators to make a qualified judgement. However, this post shares some thoughts. library(ggplot2)
set.seed(0)
df <- data.frame(x = c(
runif(20, 9, 11),
runif(10, 19, 21),
runif(20, 29, 31),
runif(30, 39, 41)
))
ggplot(df, aes(x)) +
geom_density(bw = "nrd0") +
geom_rug() +
ggtitle("bw = 'nrd0'") Whereas ggplot(df, aes(x)) +
geom_density(bw = "sj") +
geom_rug() +
ggtitle("bw = 'sj'") Counter-argument might be that if you have some exponential data, df <- data.frame(x = rexp(1000))
ggplot(df, aes(x)) +
geom_density(bw = "nrd0", bounds = c(0, Inf)) +
geom_rug() +
ggtitle("bw = 'nrd0'") Whereas ggplot(df, aes(x)) +
geom_density(bw = "sj", bounds = c(0, Inf)) +
geom_rug() +
ggtitle("bw = 'sj'") Created on 2024-05-20 with reprex v2.1.0 It is probably a trade-off that I cannot motivate very well. |
I could perhaps dig up some examples if there is interest. I changed the ggdist default to the equivalent of |
For example set.seed(20240522)
data.frame(x = rlnorm(10000)) |>
ggplot() +
geom_density(aes(x, color = "sj"), bw = "sj", bounds = c(0, Inf), n = 1000) +
geom_density(aes(x, color = "nrd0"), bw = "nrd0", bounds = c(0, Inf), n = 1000) +
geom_function(fun = dlnorm, n = 1000, linetype = "22") +
coord_cartesian(xlim = c(0,5)) It is worth pointing out that density(rep(1, 10), bw = "SJ")
## Error in bw.SJ(x, method = "ste") : sample is too sparse to find TD ggdist handles this by falling back to |
Thanks @mjskay, I appreciate your thoughts on the matter. I tend to agree with you and the authors of |
This PR aims to fix #3825.
Briefly,
?density
recommends the"sj"
method over the"nrd0"
default. This change propagates that recommendation to ggplot2's density calculations.Some details:
calc_bw()
andprecompute_bw()
were doing the exact same task, so I merged these.var(x) == 0
, most of thestats::bw.*()
functions throw an error, so we keepbw.nrd0()
for these degenerate cases because it will not throw an error.