Description
I found an error thrown by stat_density_2d not very informative. It computes an illegal bandwidth for me behind the scene, which causes an internal error that is not explained in the messages. Specifying the bandwidth explicitly can fix the problem. However, I expect stat_density_2d can either handle these edge cases or point out that the default value given the data is illegal and manual input is required.
The following example works fine:
library(ggplot2)
df <- data.frame(x=sample(0:10, 100, replace=T), y=rep(0:10, 100, replace=T))
ggplot(df) + stat_density_2d(geom='density_2d', mapping=aes(x,y))
but the next one will throw an error:
df <- data.frame(x=sample(0:10, 100, replace=T), y=c(rep(5, 80), sample(0:10, 20, replace=T)))
ggplot(df) + stat_density_2d(geom='density_2d', mapping=aes(x,y))
Error in
stat_density_2d()
:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error inseq_len()
:
! argument must be coercible to non-negative integer
The error messages is quite confusing. By digging into the warnings, I found the root cause of the problem:
1: Computation failed in
stat_density2d()
Caused by error inMASS::kde2d()
:
! bandwidths must be strictly positive
In stat_density_2d, h is automatically computed before calling kde2d, if not given
if (is.null(h)) {
h <- c(MASS::bandwidth.nrd(data$x), MASS::bandwidth.nrd(data$y))
h <- h * adjust
}
# calculate density
dens <- MASS::kde2d(
data$x, data$y, h = h, n = n,
lims = c(scales$x$dimension(), scales$y$dimension())
)
and bandwidth.nrd uses the following formula by default
function(x)
{
r <- quantile(x, c(0.25, 0.75))
h <- (r[2] - r[1])/1.34
4 * 1.06 * min(sqrt(var(x)), h) * length(x)^(-1/5)
}
So if one of data$x and data$y has more than 75% of identical values, defualt bandwidth will become 0 without warning, and it will immediately be considered as illegal by kde2d.