Skip to content

sqrt_trans, scale limit expansion, and missing breaks #980

Closed
@BrianDiggs

Description

@BrianDiggs

Prompted by a posting on the mailing list (https://groups.google.com/d/topic/ggplot2/IUje5H0jwm4).

Summary

Specific problem: Breaks near 0 are not displayed when the square root transformation is applied to a scale.

General problem: Scale expansion in transformed coordinate space can lead to values which are not meaningfully (or correctly) invertable to data space leading to improperly excluded breaks.

Reproducible example:

library("ggplot2")
library("scales")

DF <- data.frame(x = seq(0,1,by=0.1),
                 y = seq(0,1,by=0.1))

ggplot(DF, aes(x=x, y=y)) + 
  geom_point() + 
  scale_x_sqrt() +
  scale_y_continuous()

Expected result

A plot with breaks labeled at 0, 0.25, 0.50, 0.75, and 1.00

Actual results

Actual results

Note that there is no 0 break on the x-axis.

Discussion

The error occurs because when the limits (in coordinate space) are expanded, there are negative values which, when transformed back to data space, give the incorrect limits from which breaks are determined (or at least limited). Stepping through the effective steps that occur for getting the breaks shows:

st <- sqrt_trans()
(x<-st$transform(c(0,1)))
## [1] 0 1
(x<-expand_range(x, 0.05, 0))
## [1] -0.05  1.05
(limits<-st$inverse(x))
## [1] 0.0025 1.1025
(breaks<-st$breaks(limits))
## [1] 0.00 0.25 0.50 0.75 1.00
st$trans(breaks)
## [1] 0.0000 0.5000 0.7071 0.8660 1.0000
st$trans(limits)
## [1] 0.05 1.05
censor(st$trans(breaks), st$trans(limits))
## [1]     NA 0.5000 0.7071 0.8660 1.0000

The real problem is that the result of the expand_range call lies outside the domain of the transformation. How should extra-domain values be treated?

Workarounds

Don't square negative values

One solution to this problem is an alternative transformation, one that does not invert negative values. A transformation should be one-to-one (within its domain) and sqrt_trans is, but it happily will run the inverse on negative values which can not occur if everything is constrained within the domain. A simple approach is to just map all negative values to 0

mysqrt_trans <- function() {
  trans_new("mysqrt", 
            transform = base::sqrt,
            inverse = function(x) ifelse(x<0, 0, x^2),
            domain = c(0, Inf))
}

Squish range before inverting

If we assume that all transformations are monotonic (I'm not sure if ggplot2/scales assume transformations are monotonic or just one-to-one; I can not come up with a useful transformation which is not, though I can create a pathological one.), then it is reasonable to squish any values outside the range (not domain) of the transformation. Bringing them back to the nearest extreme should be sufficient. Therefore a more general approach for an inverse would be

mysqrt_trans <- function() {
  domain <- c(0, Inf)
  transform <- base::sqrt
  range <- transform(domain)
  trans_new("mysqrt", 
            transform = transform,
            inverse = function(x) squish(x, range=range)^2,
            domain = domain)
}

Squish to range whenever values are extended

This approach makes it the responsibility of the code which manipulates transformed (coordinate space) values to squish those to the appropriate range if there is any chance that that range is violated. If monotonicity is assumed, I think any interpolations should be safe, but any operation which can result in a value more extreme than the existing most extreme values would need to be squished. If this approach is taken, it would be worth adding an additional component range to the trans which is just the result of transform(domain).

The transformation, then, could have its inverse just assume that the data is in the range or it can check that before proceeding (just as now transform may or may not check domain before proceeding). Ideally, the transformation should throw an error if either transform is called with values outside domain or inverse is called with values outside range and this would help pick out places where calling code is not behaving appropriately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions