Skip to content

geom_density() for bounded data #3387

Closed
@echasnovski

Description

@echasnovski

Currently geom_density() directly uses density() to compute kernel density estimation. Due to its nature, it has "extending property": output density can imply that values outside the range of input data are possible (with default "gaussian" kernel). This is a realistic practical setup, but there are cases when this is not true and data is bounded (for example, when only positive values are possible). It can be a good idea to support kernel density estimation on this type of bounded data with new bounds argument of geom_density().

In my opinion, one of the methods most easiest to implement, understand, and teach, is "reflection" method. There is one possible description on this page. Basically, boundary correction is done by doing "standard" kernel density estimation first, and then "reflecting" tails outside of desired interval to be inside. Densities inside and outside of desired interval are added together in "symmetric fashion": d(x) = d_f(x) + d_f(l - (x-l)) + d_f(r + (r-x)), where d_f is density of input, d is density of output, l and r are left and right edges of desired interval.

I made some quick and dirty changes to ggplot2 for demonstration. stat_density() gets bounds argument with default value of c(-Inf, Inf). Here are some examples of proposed functionality:

library(tibble)

set.seed(101)

ggplot(tibble(x = runif(100)), aes(x)) +
  geom_density() +
  geom_density(bounds = c(0, 1), color = "blue", ) +
  stat_function(data = tibble(x = c(0, 1)), fun = dunif, color = "red")

ggplot(tibble(x = rexp(100)), aes(x)) +
  geom_density() +
  geom_density(bounds = c(0, Inf), color = "blue") +
  stat_function(data = tibble(x = c(0, 5)), fun = dexp, color = "red")

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions