Skip to content

replace parse() with parse_safe() in geom_text() #2867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Sep 4, 2018

Conversation

slowkow
Copy link
Contributor

@slowkow slowkow commented Aug 28, 2018

The built in parse() function silently drops items:

length(parse(text = c("alpha", "", "gamma")))

We expect the length to be 3, but instead it is 2.

So, we add a new function parse_safe() that keeps all items.

This addresses issue #2864

The built in `parse()` function silently drops items:

    length(parse(text = c("alpha", "", "gamma")))

We expect the length to be 3, but instead it is 2.

So, we add a new function `parse_safe()` that keeps all items.

This addresses issue tidyverse#2864
@slowkow
Copy link
Contributor Author

slowkow commented Aug 28, 2018

image

Details

Here is the code I used to generate the images.

dev.off()
library(ggplot2)
library(patchwork)

d1 <- data.frame(
  x = c(1, 2, 3),
  y = c(1, 0, 1),
  label = c("alpha", " ", "gamma"),
  stringsAsFactors = FALSE
)
p <- ggplot(d1, aes(x = x, y = y, label = label)) + geom_point(size = 5, color = "red")
p1 <- p + geom_text() + labs(title = "stringsAsFactors = FALSE\nparse = FALSE")
p2 <- p + geom_text(parse = TRUE) + labs(title = "stringsAsFactors = FALSE\nparse = TRUE")
p1 + p2
ggsave(filename = "pullXXX/d1-text.png", plot = p1 + p2, width = 6, height = 3)

p1 <- p + geom_label() + labs(title = "stringsAsFactors = FALSE\nparse = FALSE")
p2 <- p + geom_label(parse = TRUE) + labs(title = "stringsAsFactors = FALSE\nparse = TRUE")
p1 + p2
ggsave(filename = "pullXXX/d1-label.png", plot = p1 + p2, width = 6, height = 3)

d2 <- data.frame(
  x = c(1, 2, 3),
  y = c(1, 0, 1),
  label = c("alpha", " ", "gamma"),
  stringsAsFactors = TRUE
)
p <- ggplot(d2, aes(x = x, y = y, label = label)) + geom_point(size = 5, color = "red")
p1 <- p + geom_text() + labs(title = "stringsAsFactors = TRUE\nparse = FALSE")
p2 <- p + geom_text(parse = TRUE) + labs(title = "stringsAsFactors = TRUE\nparse = TRUE")
p1 + p2
ggsave(filename = "pullXXX/d2-text.png", plot = p1 + p2, width = 6, height = 3)

p1 <- p + geom_label() + labs(title = "stringsAsFactors = TRUE\nparse = FALSE")
p2 <- p + geom_label(parse = TRUE) + labs(title = "stringsAsFactors = TRUE\nparse = TRUE")
p1 + p2
ggsave(filename = "pullXXX/d2-label.png", plot = p1 + p2, width = 6, height = 3)

@clauswilke
Copy link
Member

Several comments:

  1. This still parses twice. It must be possible to parse only once, even if the result should be an expression object and not a list.

  2. Please set keep.source = FALSE in parse(): parse(text = text, keep.source = FALSE) Otherwise regression tests won't work.

  3. Please add a few regression tests, including testing with character vectors, factors, data with NAs, and data containing empty strings. They will generally look like this (in the testthat suite): expect_equal(parse_safe(c("A", "B", "C")), expression(A, B, C))

@clauswilke
Copy link
Member

clauswilke commented Aug 29, 2018

Does this version of Hadley's original function do what you need? [Though I still don't understand why a list of expressions won't work. It does in my testing with textGrob() (see here.)]

parse_hadley <- function(text) {
  text <- as.character(text)
  out <- vector("list", length(text))
  for (i in seq_along(text)) {
    expr <- parse(text = text[[i]])
    out[[i]] <- if (length(expr) == 0) "" else expr[[1]]
  }
  do.call(expression, out)
}

@hadley
Copy link
Member

hadley commented Aug 29, 2018

I think vector("expression", length(text)) is slightly preferential than the final do.call() (partly because the semantics are so complex when the inputs are expressions)

@slowkow
Copy link
Contributor Author

slowkow commented Aug 29, 2018

Claus: This pull request was created before our nice discussion in #2864. It needs to be updated.

Regarding each of your comments:

  1. I will change the parse_safe() function to use this code from the discussion in issue Unexpected order of text labels when parse=TRUE #2864:

    parse_hadley2 <- function(text) {
      out <- vector("expression", length(text))
      for (i in seq_along(text)) {
        expr <- parse(text = text[[i]], keep.source = FALSE)
        out[[i]] <- if (length(expr) == 0) NA else expr[[1]]
      }
      out
    }
  2. Will do!

  3. I agree that tests are crucial, will do!

I still don't understand why a list of expressions won't work

Your code appears to work when you test it with c("a", "", "b"):
image

But it fails when you test it with c("alpha", "", "gamma"):
image

It should look like this:
image

@clauswilke
Copy link
Member

clauswilke commented Aug 29, 2018

I've figured out why lists of expressions don't work here but seem to work in other parts of ggplot2. We're actually converting them back to expression objects, e.g. here:

ggplot2/R/guides-axis.r

Lines 35 to 41 in 71cb174

if (is.list(labels)) {
if (any(sapply(labels, is.language))) {
labels <- do.call(expression, labels)
} else {
labels <- unlist(labels)
}
}

It's good to keep that in mind, as I think the ggplot2 code is a bit inconsistent in this regard. For example, my original code was inspired by this line:

ggplot2/R/sf.R

Line 481 in 34c01cc

graticule$degree_label <- lapply(graticule$degree_label, function(x) parse(text = x)[[1]])

Also, functions in the scales package return lists of expressions rather than expression objects:

> scales::math_format()(1:5)
[[1]]
10^`1`

[[2]]
10^`2`

[[3]]
10^`3`

[[4]]
10^`4`

[[5]]
10^`5`

In summary: I agree with your latest approach to parse_safe(), which returns an expression object. I'm wondering to what extent other parts of the code should use this function also, e.g. the code in sf I linked to above.

Use Hadley's code to parse the expressions once instead of twice.

Use `parse(..., keep.source = FALSE)`, as Claus recommended.

Add an example to demonstrate why `parse_safe()` is needed.
@slowkow
Copy link
Contributor Author

slowkow commented Aug 30, 2018

I agree that there might be other places, in addition to geom-text.r and geom-label.r that could benefit from the new parse_safe() function.

It would be nice to demonstrate that they suffer from the same problem before I modify them. Please share if you have any examples that demonstrate the issue!

@clauswilke
Copy link
Member

Kamil, here you go. I assume it's caused by this line:

ggplot2/R/sf.R

Line 481 in 34c01cc

graticule$degree_label <- lapply(graticule$degree_label, function(x) parse(text = x)[[1]])

Reprex:

library(ggplot2) 
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

# works
ggplot(nc) +
  geom_sf(aes(fill = AREA)) +
  scale_y_continuous(
    breaks = c(34, 35, 36),
    labels = c("34*degree*N", "35*degree*N", "36*degree*N")
  )

# breaks
ggplot(nc) +
  geom_sf(aes(fill = AREA)) +
  scale_y_continuous(
    breaks = c(34, 35, 36),
    labels = c("34*degree*N", "", "36*degree*N")
  )
#> Error in parse(text = x)[[1]]: subscript out of bounds

Created on 2018-08-30 by the reprex package (v0.2.0).

slowkow added a commit to slowkow/ggplot2 that referenced this pull request Aug 30, 2018
This patch fixes an example that triggers an error:

    library(ggplot2)
    nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
    ggplot(nc) +
      geom_sf(aes(fill = AREA)) +
      scale_y_continuous(
        breaks = c(34, 35, 36),
        labels = c("34*degree*N", "", "36*degree*N")
      )
    #> Error in parse(text = x)[[1]]: subscript out of bounds

See tidyverse#2867 for more details.
@slowkow
Copy link
Contributor Author

slowkow commented Aug 30, 2018

Thanks, Claus. My commit a52e7fd fixes the geom_sf() issue.

@slowkow
Copy link
Contributor Author

slowkow commented Aug 30, 2018

Here is another example of this issue (the alpha should be on the first facet):

library(ggplot2)
mtcars$cyl2 <- factor(mtcars$cyl, labels = c("alpha", "beta", "gamma"))
mtcars$cyl2 <- as.character(mtcars$cyl2)
mtcars$cyl2[mtcars$cyl2 == "beta"] <- ""
p <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
p + facet_grid(. ~ cyl2, labeller = label_parsed)

image

Here's the relevant code:

ggplot2/R/labeller.r

Lines 152 to 169 in 510b4b4

#' @rdname labellers
#' @export
label_parsed <- function(labels, multi_line = TRUE) {
labels <- label_value(labels, multi_line = multi_line)
if (multi_line) {
# Using unname() and c() to return a cleaner and easily testable
# object structure
lapply(unname(labels), lapply, function(values) {
c(parse(text = as.character(values)))
})
} else {
lapply(labels, function(values) {
values <- paste0("list(", values, ")")
lapply(values, function(expr) c(parse(text = expr)))
})
}
}
class(label_parsed) <- c("function", "labeller")


It turns out I'm wrong. The example works just fine! The "beta" data points are displayed before the "alpha" data points, so it is ok.

To see that everything is ok, compare this figure to the previous one:

p + facet_grid(. ~ cyl, labeller = label_parsed)

image

This patch fixes an example that triggers an error:

    library(ggplot2)
    nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
    ggplot(nc) +
      geom_sf(aes(fill = AREA)) +
      scale_y_continuous(
        breaks = c(34, 35, 36),
        labels = c("34*degree*N", "", "36*degree*N")
      )
    #> Error in parse(text = x)[[1]]: subscript out of bounds

See tidyverse#2867 for more details.
@slowkow
Copy link
Contributor Author

slowkow commented Aug 30, 2018

As far as I can tell, there is only one more location where we might be interested to use parse_safe():

ggplot2/R/aes.r

Lines 317 to 320 in 510b4b4

# automatically detected aes
vars <- intersect(ggplot_global$all_aesthetics, vars)
names(vars) <- vars
aes <- lapply(vars, function(x) parse(text = x)[[1]])

I would leave this as is, because:

  • This is in the deprecated aes_auto() function.
  • I don't understand this code.

What do you think? Is there anything missing in this pull request?

@hadley
Copy link
Member

hadley commented Aug 30, 2018

No need to touch deprecated functions.

But I do think it's worth using the same code in label_parsed(). Unfortunately I can't tell exactly what the code is trying to do there 😞

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on the organisation

R/utilities.r Outdated
@@ -430,3 +430,24 @@ is_column_vec <- function(x) {
dims <- dim(x)
length(dims) == 2L && dims[[2]] == 1L
}

#' Parse a vector of expressions without silently dropping any items.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not exported, I don't think it needs to be documented. A comment mentioning that parse(text = text) has subtle problems with link to issue should be adequate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling future developers will appreciate the hints in the documentation. I know I was definitely surprised by the default behavior of parse() dropping things. Totally unexpected for me.

Regarding the link, is #2864 sufficient or do you prefer to put a full url like https://github.com/tidyverse/ggplot2/issues/2864?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Hadley meant you can write # instead of #' at the beginning of the line. Writing issue #2864 is sufficient to refer to the issue. That's how we do it in the NEWS.md file also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I'd actually prefer the full url since that is clickable in RStudio. I'd suggest something like:

# Parse takes n lines and returns m _expressions_
# See https://github.com/tidyverse/ggplot2/issues/2864 for discussion

@@ -0,0 +1,52 @@
context("Parsing expressions")

expect_equal(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to live a couple of test_that() blocks in test-utilities.R - see detail in http://style.tidyverse.org/tests.html#organisation (ggplot2 doesn't fully follow this organisation but should move towards it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I'll move the tests into test-utilities.R

expression(alpha, NA, gamma, NA)
)

expect_equal(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should put all "multi" expressions into one test

expression(alpha, NA, integral(f(x) * dx, a, b))
)

expect_equal(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't test anything new?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, some of the tests are probably redundant. I'll try to reduce redundancy.

expression(NA, 1, NA, 2)
)

expect_equal(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this tests anything new either.

@clauswilke
Copy link
Member

I don't understand the code in label_parsed() either, but isn't parse_safe() meant as drop-in replacement for parse() and hence should just work? Have you tried leaving everything else untouched and just writing parse_safe() instead of parse()?

R/sf.R Outdated
@@ -553,11 +549,18 @@ CoordSf <- ggproto("CoordSf", CoordCartesian,
graticule <- panel_params$graticule
east <- graticule[graticule$type == "E" & !is.na(graticule$degree_label), ]

labs <- east$degree_label

# parse labels into expressions if required
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're actually fixing another problem here at the same time, which is good. (The problem is that we may have to parse only labels along one axis.) However, this distributes the autoparsing to two separate locations. Maybe you could document this a little better by stating that labels created by sf::st_graticule() contain degree symbols and need to be parsed to look right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kamil, I've been pondering this some more. It's bugging me that we're breaking the parsing into two different places because I have a PR pending where we're labeling based on cardinal directions (N, S, E, W) rather than plot side and it can shuffle which labels appear where, and that will break with your current code. Is there any reason why we can't just do something like the following (using parse_safe() instead of parse(), though):

graticule <- data.frame(
  degree_label = c("abcd", "b", "10 * degree * E"),
  stringsAsFactors = FALSE
)
parse_ids <- grepl("degree", graticule$degree_label)
if (any(parse_ids)) {
  graticule$degree_label <- ifelse(
    parse_ids,
    parse(text = graticule$degree_label),
    as.expression(graticule$degree_label)
  )
}
graticule$degree_label
#> expression("abcd", "b", 10 * degree * E)

This will keep strings as strings but stick everything into an expression object so we don't have to worry about which label is of which type.

Copy link
Contributor Author

@slowkow slowkow Sep 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I broke the parsing into two different places to avoid having an expression() column in the graticule dataframe.

Here is an example that shows the problem I ran into:

graticule <- data.frame(
  type = c("E", "E", "E"),
  degree_label = c(NA, "10 * degree * E", "15 * degree * E"),
  stringsAsFactors = FALSE
)

# This works if degree_label is not an 'expression'
east <- graticule[graticule$type == "E" & !is.na(graticule$degree_label), ]

parse_ids <- grepl("degree", graticule$degree_label)
if (any(parse_ids)) {
  graticule$degree_label <- ifelse(
    parse_ids,
    parse(text = graticule$degree_label),
    as.expression(graticule$degree_label)
  )
}
graticule
#> Warning in format.data.frame(x, digits = digits, na.encode = FALSE):
#> corrupt data frame: columns will be truncated or padded with NAs
#>   type                                                degree_label
#> 1    E expression(NA_character_, 10 * degree * E, 15 * degree * E)
#> 2    E                                                        <NA>
#> 3    E                                                        <NA>
graticule$degree_label
#> expression(NA_character_, 10 * degree * E, 15 * degree * E)
is.na(graticule$degree_label)
#> Warning in is.na(graticule$degree_label): is.na() applied to non-(list or
#> vector) of type 'expression'
#> [1] FALSE FALSE FALSE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I should know better than to paste any code without proper reprex. I had tried the code except the final assignment to a data frame, which I assumed would just work.

The following should actually work. (It's basically what the code does currently, but with being a bit smarter about which labels to parse.)

graticule <- data.frame(
  type = c("E", "E", "E", "E"),
  degree_label = c(NA, "abcd", "10 * degree * E", "15 * degree * E"),
  stringsAsFactors = FALSE
)

parse_ids <- grepl("degree", graticule$degree_label)
if (any(parse_ids)) {
  graticule$degree_label <- ifelse(
    parse_ids,
    lapply(graticule$degree_label, function(x) parse(text = x)[[1]]),
    lapply(graticule$degree_label, function(x) as.expression(x)[[1]])
  )
}
graticule$degree_label
#> [[1]]
#> [1] NA
#> 
#> [[2]]
#> [1] "abcd"
#> 
#> [[3]]
#> 10 * degree * E
#> 
#> [[4]]
#> 15 * degree * E

Created on 2018-09-01 by the reprex package (v0.2.0).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to show that this creates a proper data frame:

graticule <- data.frame(
  type = c("E", "E", "E", "E"),
  degree_label = c(NA, "abcd", "10 * degree * E", "15 * degree * E"),
  stringsAsFactors = FALSE
)

parse_ids <- grepl("degree", graticule$degree_label)
if (any(parse_ids)) {
  graticule$degree_label <- ifelse(
    parse_ids,
    lapply(graticule$degree_label, function(x) parse(text = x)[[1]]),
    lapply(graticule$degree_label, function(x) as.expression(x)[[1]])
  )
}
graticule
#>   type    degree_label
#> 1    E              NA
#> 2    E            abcd
#> 3    E 10 * degree * E
#> 4    E 15 * degree * E

Created on 2018-09-01 by the reprex package (v0.2.0).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm starting to understand the motivation for wrapping expressions in lists here and also in label_parsed().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the conditional inside the mapping than the other way round is probably more elegant.

graticule <- data.frame(
  type = c("E", "E", "E", "E"),
  degree_label = c(NA, "abcd", "10 * degree * E", "15 * degree * E"),
  stringsAsFactors = FALSE
)

parse_ids <- grepl("degree", graticule$degree_label)
if (any(parse_ids)) {
  graticule$degree_label <- Map(
    function(b, l) {
      if (b) {
        parse(text = l)[[1]]
      } else {
        as.expression(l)[[1]]
      }
    },
    parse_ids, graticule$degree_label
  )
}
graticule
#>   type    degree_label
#> 1    E              NA
#> 2    E            abcd
#> 3    E 10 * degree * E
#> 4    E 15 * degree * E

Created on 2018-09-02 by the reprex package (v0.2.0).

@slowkow
Copy link
Contributor Author

slowkow commented Sep 3, 2018

Thanks for your patience on this pull request! I hope we're getting closer to the end.

What are your comments?

  • I adjusted the comments for parse_safe() as you suggested.
  • I moved the tests to tests/testthat/test-utilities.r
  • I removed the unnecessary instances of text = in geom_text() and geom_label()
  • I changed geom_sf() back to parsing only once instead of parsing once for east and once for north.
  • I still vote to leave label_parsed() as-is, with more discussion below.

Regarding label_parsed()

I have not been able to find an example where label_parsed() requires the new parse_safe() function. That's the main reason I don't want to change it.

To give you an idea of how label_parsed() would change with the new parse_safe(), here is an example:

# with parse_safe()
label_parsed(c("", "alpha", "gamma"))
#
# [[1]]
# [[1]][[1]]
# expression(NA)
#
#
# [[2]]
# [[2]][[1]]
# expression(alpha)
#
#
# [[3]]
# [[3]][[1]]
# expression(gamma)
# with parse()
label_parsed(c("", "alpha", "gamma"))
#
# [[1]]
# [[1]][[1]]
# expression()
#
#
# [[2]]
# [[2]][[1]]
# expression(alpha)
#
#
# [[3]]
# [[3]][[1]]
# expression(gamma)

If you think it's better to have expression(NA) instead of expression(), then we should replace parse() with parse_safe(). (I think it doesn't matter.) Otherwise, I think it's ok to let it be.

R/sf.R Outdated
parse_ids, graticule$degree_label
)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind moving this code block into the fixup_graticule_labels() function?

@clauswilke
Copy link
Member

This all looks good to me. I have one last, minor comment. Could you please move the label parsing code into the function fixup_graticule_labels()? The reason is that with the changes you're making to CoordSf(), I'll be able to properly fix the problem that we currently have where we're guessing whether labels should be parsed or not, rather than making this configurable. For the fix I envision, I'll need this code block in that function.

R/utilities.r Outdated
@@ -430,3 +430,21 @@ is_column_vec <- function(x) {
dims <- dim(x)
length(dims) == 2L && dims[[2]] == 1L
}

# Parse takes takes n strings and returns n expressions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed "takes takes". Since I asked for one more substantive change, maybe you could fix this also.

@slowkow
Copy link
Contributor Author

slowkow commented Sep 4, 2018

  • Moved parsing code into fixup_graticule_labels()
  • Fixed typo.

Hadley, do you still prefer that label_parsed() use parse_safe()? Or shall we leave it as-is?

I think this might be the last issue to agree upon in this pull request.

R/utilities.r Outdated
@@ -430,3 +430,21 @@ is_column_vec <- function(x) {
dims <- dim(x)
length(dims) == 2L && dims[[2]] == 1L
}

# Parse takes n strings and returns n expressions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be: "Parse takes a vector of n lines and returns m expressions."

@hadley
Copy link
Member

hadley commented Sep 4, 2018

I would like to understand what label_parsed() does and ensure that it uses the same logic as here. But I think that would be better done in another PR.

NEWS.md Outdated
@@ -59,6 +59,11 @@
* `position_nudge()` is now more robust and nudges only in the direction
requested. This enables, for example, the horizontal nudging of boxplots
(@clauswilke, #2733).

* `geom_text(..., parse = TRUE)` now correctly renders the expected number of
items instead of silently dropping items that are not valid expressions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not valid isn't quite the right turn of phrase - they are valid; they're just empty. We also fixed handling of multiple expressions in a single string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, will try to make the comment more accurate.

R/sf.R Outdated
if (parse_id) {
parse(text = label)[[1]]
} else {
as.expression(label)[[1]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we turning label into an expression object and then immediately extracting it out of the expression object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need the symbol itself, not expression(symbol). This is directly copied from the current code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with this code chunk as a whole, is that when I read it I can't easily tell what type of object graticule$degree_label is supposed to be. It's obviously a list, and some of the components are expressions (not expression objects), but what are the others supposed to be?

You could fix that with a comment, but it would be even better to use functions where it's obvious what the output type is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need a symbol, and we have a character vector of length one, then the function to use is as.symbol()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was confused. This can definitely be simplified.
as.expression(label)[[1]] came from an earlier version of the code where I stuck everything into an expression object and thus needed as.expression(label). But now everything goes into a list, and strings can go directly since they are self-quoting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so my gut feeling was right and the code I suggest below is a bit clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. See also my other comment, below.

R/sf.R Outdated
# Convert the string 'degree' to the degree symbol
parse_ids <- grepl("\\bdegree\\b", graticule$degree_label)
if (any(parse_ids)) {
graticule$degree_label <- Map(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not use Map() here - I suspect you could more simply right as something like:

labels <- as.list(graticule$degree_label)
labels[parse_ids] <- lapply(labels[parse_ids], parse_safe)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is much simpler. For some reason I always forget that we can assign to a subset of elements in a list. However, it still needs the part where only the first element of the expression object is extracted, so we get the element itself not inside an expression() statement:

labels <- as.list(graticule$degree_label)
labels[parse_ids] <- lapply(labels[parse_ids], function(x) parse_safe(x)[[1]])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh but parse_safe is already vectorised so you can skip the lapply

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you apply the character vector that hasn’t been coerced to a list

Copy link
Member

@clauswilke clauswilke Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. It may be best to work with a full reprex. The following works:

parse_safe <- function(text) {
  out <- vector("expression", length(text))
  for (i in seq_along(text)) {
    expr <- parse(text = text[[i]])
    out[[i]] <- if (length(expr) == 0) NA else expr[[1]]
  }
  out
}

graticule <- data.frame(
  type = c("E", "E", "E", "E"),
  degree_label = c(NA, "abcd", "10 * degree * E", "15 * degree * E"),
  stringsAsFactors = FALSE
)

needs_parsing <- grepl("degree", graticule$degree_label)
labels <- as.list(graticule$degree_label)
labels[needs_parsing] <- lapply(labels[needs_parsing], function(x) parse_safe(x)[[1]])
graticule$degree_label <- labels
graticule
#>   type    degree_label
#> 1    E              NA
#> 2    E            abcd
#> 3    E 10 * degree * E
#> 4    E 15 * degree * E

Do you see a way to make it simpler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're correct, it can be made simpler.

parse_safe <- function(text) {
  out <- vector("expression", length(text))
  for (i in seq_along(text)) {
    expr <- parse(text = text[[i]])
    out[[i]] <- if (length(expr) == 0) NA else expr[[1]]
  }
  out
}

graticule <- data.frame(
  type = c("E", "E", "E", "E"),
  degree_label = c(NA, "abcd", "10 * degree * E", "15 * degree * E"),
  stringsAsFactors = FALSE
)

needs_parsing <- grepl("degree", graticule$degree_label)
labels <- as.list(graticule$degree_label)
labels[needs_parsing] <- parse_safe(graticule$degree_label[needs_parsing])
graticule$degree_label <- labels
graticule
#>   type    degree_label
#> 1    E              NA
#> 2    E            abcd
#> 3    E 10 * degree * E
#> 4    E 15 * degree * E

Created on 2018-09-04 by the reprex package (v0.2.0).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that’s exactly what I was imagining.

R/sf.R Outdated
if (any(grepl("degree", graticule$degree_label)))
graticule$degree_label <- lapply(graticule$degree_label, function(x) parse(text = x)[[1]])
# Convert the string 'degree' to the degree symbol
parse_ids <- grepl("\\bdegree\\b", graticule$degree_label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be more informative if called needs_parsing or similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree!

@slowkow
Copy link
Contributor Author

slowkow commented Sep 4, 2018

Hadley, I think the labeller functions have some issues that go beyond the scope of this pull request... it's difficult to write up my findings because I am running into unexpected behavior as I try to make examples. I agree with you that we ought to discuss this issue separately.

@clauswilke
Copy link
Member

Thanks Kamil!

@clauswilke clauswilke merged commit 3f93180 into tidyverse:master Sep 4, 2018
@slowkow slowkow deleted the slowkow-parse branch September 4, 2018 17:19
@slowkow
Copy link
Contributor Author

slowkow commented Sep 4, 2018

Claus and Hadley, thanks for helping me with this pull request! 🐄

Claus, I am curious to know how you made such a nice merge. I'm still learning how to use GitHub's collaboration features, and pull requests make me think twice before I do anything.

I'm impressed that you've squashed everything into one big commit with a nice summary of all the commit messages.

Also, I just noticed that there is a checkbox on the right that says [ ] Allow edits from maintainers. Maybe I should have checked that box! It might have been faster to let you directly edit my copy of the files. Sorry about that. I hope you're aware of this feature, too :)

@clauswilke
Copy link
Member

Squash-merge is the default on github. I just need to click a button. :-) GitHub then opens a window where you can edit the commit message, and that's where I inserted "Closes issue #2864".

@lock
Copy link

lock bot commented Mar 3, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Mar 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants