Skip to content

geom_box() and scale_x_log10() / scale_y_log10() produce incorrect whiskers/outliers. #6706

@ja-ortiz-uniandes

Description

@ja-ortiz-uniandes

Description

When using geom_box() to plot data distributions, the whiskers appear to be computed as a linear graphical multiple of the IQR, regardless of the axis scale. As a result, applying scale_x_log10() or scale_y_log10() produces incorrect whiskers and incorrect outliers.

Reproducible example

R and ggplot version

> R.version.string
[1] "R version 4.5.1 (2025-06-13 ucrt)"
> 
> # Print ggplot2 version
> packageVersion("ggplot2")
[1] ‘4.0.0’

Demonstration of the issue

library(ggplot2)

set.seed(123)

# Generate exponential data
n <- 2000
rate <- 1
x <- (rexp(n, rate = rate) + 1) * 1e4

# Compute percentiles and IQR-based bounds
qs <- quantile(x, probs = c(0.25, 0.50, 0.75), names = FALSE)
q1  <- qs[1]
q2  <- qs[2]
q3  <- qs[3]
iqr <- IQR(x)
lower_bound <- q1 - 1.5 * iqr
upper_bound <- q3 + 1.5 * iqr

# Inspect computed values
stats <- data.frame(
  p25 = q1,
  p50 = q2,
  p75 = q3,
  IQR = iqr,
  lower_1p5_IQR = lower_bound,
  upper_1p5_IQR = upper_bound
)
stats
       p25     p50      p75      IQR lower_1p5_IQR upper_1p5_IQR
1 12847.19 17139.3 24405.04 11557.84      -4489.57       41741.8
# Plotting
base_plot <- ggplot(data.frame(x = x), aes(x = x)) +
  geom_boxplot() + labs(title = "Exponential sample (linear scale)")

p_log <- base_plot + scale_x_log10() + labs(title = "Exponential sample (log10 scale)")

base_plot
Image
p_log
Image

In the example above, the theoretical upper whisker should extend to the most extreme point within the range [p75, p75 + 1.5 × IQR]. In this case 1.5 × IQR = 41741.8, the whisker should end at the most extreme point below or equal to that value. This behaviour seems correct in the linear-scale plot.

However, when using a logarithmic scale, the whisker incorrectly extends beyond 50,000, which is outside the valid range. It appears that the whisker’s position is determined by a linear pixel distance corresponding to 1.5 × IQR, rather than using the correct scale transformation.

Expected behaviour

  • The upper whisker should extend to the most extreme data point within [p75, p75 + 1.5 × IQR].
  • The lower whisker should extend to the most extreme data point within [p25 - 1.5 × IQR, p25].
  • “Extreme point” refers to the observation with the highest absolute value from the median.
  • The whisker range should respect the scale of the axis on which it is plotted (e.g., logarithmic or linear), not a linear pixel distance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions