Skip to content

No retransform in mapping statistics #4835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

teunbrand
Copy link
Collaborator

This is the PR following-up on the comment left under #4194 intended to fix #4155.

As mentioned before: I'm not familiar with all the design decisions at stake, nor do I know the history of map_statistic very well, so feel free to ignore or correct me if I suggest something too brazenly.

The idea of this PR is pretty simple. If #4194 has thought us that we may need to back-transform data before applying the stage() family of functions, why not just delete the forward transform?

It should (1) resolve the examples posted by Hiroaki and myself in #4155 and (2) have a tiny performance boost. I can't imagine what case(s) this might break on top of my head, so it would be good for someone with a keener eye and/or foresight to have a second look.

@thomasp85
Copy link
Member

I need to sit down with this some more, but wouldn't that mean values mapped from the stat exist in a different untransformed scale than those mapped from the data?

@teunbrand
Copy link
Collaborator Author

I'm not sure I can come up with a code example that would demonstrate this. The way I see it, is that this PR prevents the double-transform problem. It doesn't touch the mismatch between user-space and data-space that Claus mentions #4194 (comment). Towards that mismatch, it might worthwhile to simply document in ?after_stat() that 'after the stat' also means 'after position scale transformations'.

@yutannihilation
Copy link
Member

At least, it's surprising that this change won't break any tests...!

@teunbrand
Copy link
Collaborator Author

Indeed! I tried to come up with examples where it would show more unexpected behaviour than the current situation, but I'm having trouble imagining such examples.

@clauswilke
Copy link
Member

clauswilke commented May 14, 2022

Can we collect a few more examples where the transform makes a difference (regardless of which version is correct, with the line in place or absent)? Maybe this will help us better understand what is happening. If the current code has a genuine bug then you'd expect it to only affect things that are currently broken, and nobody may be writing such code because it doesn't currently work.

@teunbrand
Copy link
Collaborator Author

teunbrand commented May 14, 2022

Here are some examples of things that go differently with PR and without PR:

With PR

library(ggplot2)

p1 <- ggplot(head(economics, 24)) +
  aes(x = date, xend = after_stat(x + 30),
      y = unemploy, yend = unemploy) +
  geom_segment()

p2 <- ggplot(data.frame(value = 16),
             aes(value, 1)) +
  geom_point() +
  geom_point(
    aes(stage(value, after_stat = x), 2), 
  ) +
  scale_x_sqrt(limits = c(0, 16))

p3 <- ggplot(data.frame(x = c(1, 10), y = 1:2)) +
  geom_rect(
    aes(xmin = x, xmax = after_stat(xmin + 1), ymin = y, ymax = y + 1)
  ) +
  scale_x_log10()

patchwork::wrap_plots(p1, p2, p3)

Created on 2022-05-14 by the reprex package (v2.0.1)

With CRAN ggplot2

# Repeat what was done in chunk above (omitted for brevity)
patchwork::wrap_plots(p2, p3)

Differences in words

  • In CRAN ggplot2, p1 fails due to Error: Invalid input: date_trans works with objects of class Date only. This is due to the date transform being called on the data, after first having been transformed from Date to numeric before the stat.
  • In p2, the point with stage() is placed at sqrt(16) instead of at 16 with CRAN ggplot2.
  • In p3, the lower rectangle is missing, because xmin = log10(1) == 0, xmax = 0 + 1 = 1 and because the log transform is applied twice on original data and 'after_stat' data, xmax = log10(1) == 0, so it has 0-width. The upper rectangle is also 'wrong' because xmax < xmin in layer_data(p3).

@teunbrand
Copy link
Collaborator Author

teunbrand commented May 14, 2022

Looked a bit further into the history of these particular lines, and it seems that idea that the stat-output should be transformed came from this commit: 4cb2af3.

That made me realise that this PR ruins the following (note that untransformed maximum count is at 25):

library(ggplot2)

ggplot(faithful, aes(waiting)) +
  geom_histogram() +
  scale_y_sqrt()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2022-05-15 by the reprex package (v2.0.1)

So, it would probably be better not to merge this PR after all. I apologise for wasting everyone's time with this.
I guess the lesson learned would probably be that a test covering the case above might be helpful.

@teunbrand teunbrand closed this May 14, 2022
@yutannihilation
Copy link
Member

No worries, this isn't a waste of time. It's really great that we at last figured out the case when no-retransform matters, which no one here were aware of (I tried to dive into the history, but couldn't do it well like you). Thanks for tackling the issue!

Here's a bit simpler (or more complex?) example.

library(ggplot2)

p_bar <- ggplot(mpg) +
  geom_bar(aes(class)) +
  scale_y_sqrt() +
  ggtitle("geom_bar()")

d <- dplyr::count(mpg, class)
p_col <- ggplot(d) +
  geom_col(aes(class, n)) +
  scale_y_sqrt() +
  ggtitle("geom_col()")

patchwork::wrap_plots(p_bar, p_col)

########## DANGER ZONE ##############

p_bar$layers[[1]]$stat$retransform <- FALSE
p_col$layers[[1]]$stat$retransform <- FALSE

patchwork::wrap_plots(p_bar, p_col)

Created on 2022-05-15 by the reprex package (v2.0.1)

@clauswilke
Copy link
Member

I think it would be really helpful to add a test that fails when the retransform is disabled. @teunbrand, would you be willing to do that?

@teunbrand teunbrand changed the title No retransform in mapping statistics Retransform test May 15, 2022
@teunbrand teunbrand changed the title Retransform test No retransform in mapping statistics May 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible bug in stage()/after_stat() with scale transformations.
4 participants