Description
Prompted by a posting on the mailing list (https://groups.google.com/d/topic/ggplot2/IUje5H0jwm4).
Summary
Specific problem: Breaks near 0 are not displayed when the square root transformation is applied to a scale.
General problem: Scale expansion in transformed coordinate space can lead to values which are not meaningfully (or correctly) invertable to data space leading to improperly excluded breaks.
Reproducible example:
library("ggplot2")
library("scales")
DF <- data.frame(x = seq(0,1,by=0.1),
y = seq(0,1,by=0.1))
ggplot(DF, aes(x=x, y=y)) +
geom_point() +
scale_x_sqrt() +
scale_y_continuous()
Expected result
A plot with breaks labeled at 0, 0.25, 0.50, 0.75, and 1.00
Actual results
Note that there is no 0 break on the x-axis.
Discussion
The error occurs because when the limits (in coordinate space) are expanded, there are negative values which, when transformed back to data space, give the incorrect limits from which breaks are determined (or at least limited). Stepping through the effective steps that occur for getting the breaks shows:
st <- sqrt_trans()
(x<-st$transform(c(0,1)))
## [1] 0 1
(x<-expand_range(x, 0.05, 0))
## [1] -0.05 1.05
(limits<-st$inverse(x))
## [1] 0.0025 1.1025
(breaks<-st$breaks(limits))
## [1] 0.00 0.25 0.50 0.75 1.00
st$trans(breaks)
## [1] 0.0000 0.5000 0.7071 0.8660 1.0000
st$trans(limits)
## [1] 0.05 1.05
censor(st$trans(breaks), st$trans(limits))
## [1] NA 0.5000 0.7071 0.8660 1.0000
The real problem is that the result of the expand_range
call lies outside the domain of the transformation. How should extra-domain values be treated?
Workarounds
Don't square negative values
One solution to this problem is an alternative transformation, one that does not invert negative values. A transformation should be one-to-one (within its domain) and sqrt_trans
is, but it happily will run the inverse on negative values which can not occur if everything is constrained within the domain. A simple approach is to just map all negative values to 0
mysqrt_trans <- function() {
trans_new("mysqrt",
transform = base::sqrt,
inverse = function(x) ifelse(x<0, 0, x^2),
domain = c(0, Inf))
}
Squish range before inverting
If we assume that all transformations are monotonic (I'm not sure if ggplot2/scales assume transformations are monotonic or just one-to-one; I can not come up with a useful transformation which is not, though I can create a pathological one.), then it is reasonable to squish any values outside the range (not domain) of the transformation. Bringing them back to the nearest extreme should be sufficient. Therefore a more general approach for an inverse would be
mysqrt_trans <- function() {
domain <- c(0, Inf)
transform <- base::sqrt
range <- transform(domain)
trans_new("mysqrt",
transform = transform,
inverse = function(x) squish(x, range=range)^2,
domain = domain)
}
Squish to range whenever values are extended
This approach makes it the responsibility of the code which manipulates transformed (coordinate space) values to squish those to the appropriate range if there is any chance that that range is violated. If monotonicity is assumed, I think any interpolations should be safe, but any operation which can result in a value more extreme than the existing most extreme values would need to be squished. If this approach is taken, it would be worth adding an additional component range
to the trans
which is just the result of transform(domain)
.
The transformation, then, could have its inverse just assume that the data is in the range or it can check that before proceeding (just as now transform may or may not check domain before proceeding). Ideally, the transformation should throw an error if either transform
is called with values outside domain
or inverse
is called with values outside range
and this would help pick out places where calling code is not behaving appropriately.