Skip to content

Implement geom_sf_label() and geom_sf_text() #2761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Aug 24, 2018

Conversation

yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Jul 16, 2018

This PR closes #2742.

Background

It's very convenient that geom_sf() automatically chooses the proper geometries (point, line or polygon) based on the type of the simple feature in the data. But, we sometimes want to choose different type of geoms, for example, texts and labels.

However, adding flexibility to geom_sf() (or underlying sf::st_as_grob()) leads to unmaintainable complexity as discussed in #2111. So, we need to implement it as separate geoms/stats for code simplicity. Here, I propose geom_sf_label() and geom_sf_text(), and their corresponding stat stat_sf_coordinates().

Design

StatSfCoordinates

Texts and labels need pairs of x and y. StatSfCoordinates is a stat to provide these via computed variable x and y (lower letters). stat_sf_coordinates() is the stat_ function corresponding to this.

X and Y dimensions of a sf object can be retrieved by sf::st_coordinates(). But, we cannot simply use sf::st_coordinates() texts and labels require exactly one coordinate per geometry whereas it returns multiple locations for a polygon or a line. Instead, these two steps are needed:

  1. Choose one point per geometry by some function like sf::st_centroid() and sf::st_point_on_surface().
  2. Retrieve coordinates from the points by sf::st_coordinates().

For the first step, users can pass arbitrary function via parameter fun.geometry. By default, function(x) sf::st_point_on_surface(sf::st_zm(x)) is used; I think sf::st_point_on_surface() is more appropriate than sf::st_centroid() since labels and texts usually are intended to be put within the polygon or the line.

geom_sf_label() and geom_sf_text()

geom_sf_label() and geom_sf_text() are basically same with geom_label() and geom_text() for each, except that:

  • they uses StatSfCoordinates for stat
  • they set geometry aesthetic automatically just as geom_sf() does
  • they accept fun.geometry argument

Considerations

Z and M dimension

Simple feature objects may have Z and/or M dimension in addition to X and Y. These dimensions may be useful, but sometimes they are problematic:

  • M dimension makes functions like sf::st_point_on_surface() and sf::st_centroid() fail.
  • Z dimension may do no harm, but Z dimension is not always available in the results.

For simplicity, we simply drop Z and M and use only X and Y.

Warning for longitude/latitude data

Functions that does calculation based on the distances may warn if the on data is in unprojected coordinate system. Though the warnings should not be suppressed, we need to be careful to use geom_sf_label() in examples. Otherwise, the tests fail.

TODOs

  • Add tests
  • Write more docs about stat_sf_coordinates()
  • Write more docs about argument fun.geometry
    • Write some notes about warnings for longitude/latitude data
  • Do we really support Z and M dimension? (no)

Usages

library(ggplot2)

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

# texts and labels
p <- ggplot(nc[1:3, ]) +
  geom_sf(aes(fill = AREA))

p + geom_sf_text(aes(label = NAME), colour = "white")
#> Warning in st_point_on_surface.sfc(data$geometry): st_point_on_surface may
#> not give correct results for longitude/latitude data

p + geom_sf_label(aes(label = NAME))
#> Warning in st_point_on_surface.sfc(data$geometry): st_point_on_surface may
#> not give correct results for longitude/latitude data

# using the stat
p + stat_sf_coordinates(colour = "white", size = 10)
#> Warning in st_point_on_surface.sfc(data$geometry): st_point_on_surface may
#> not give correct results for longitude/latitude data

p +
  geom_errorbarh(
    aes(geometry = geometry,
        xmin = stat(x) - 0.1,
        xmax = stat(x) + 0.1,
        y = stat(y),
        height = 0.04),
    colour = "white",
    stat = "sf_coordinates"
  )
#> Warning in st_point_on_surface.sfc(data$geometry): st_point_on_surface may
#> not give correct results for longitude/latitude data

Created on 2018-08-19 by the reprex package (v0.2.0).

@yutannihilation yutannihilation changed the title [WIP] Implement geom_sf_label() and geom_sf_text() Implement geom_sf_label() and geom_sf_text() Aug 11, 2018
@yutannihilation
Copy link
Member Author

This PR is ready to review now.

But, I don't see why Travis fails. especially, on R 3.1; it fails on test-function-args, which I think I've fixed. What's happening...?

@clauswilke
Copy link
Member

The sf package is missing for 3.1. For the more recent releases, I have noticed that Travis seems to have network issues today. If you restart a failed build it may pass the next time.

@yutannihilation
Copy link
Member Author

Thanks. Hmm, does absense of sf affects the function args? If so, I think I'm missing something. Anyway, will give it a try later.

@yutannihilation yutannihilation force-pushed the geom-sf-label branch 2 times, most recently from 2228250 to 7111bf7 Compare August 11, 2018 08:10
@yutannihilation
Copy link
Member Author

OK, I think I found the cause. I've made two mistakes below, both of which are not appropriate when sf package is not installed.

  • use sf::st_point_on_surface() as the default value of some functions' arguments
  • reference sf's documents

@yutannihilation
Copy link
Member Author

Confirmed the build on 3.1 passed! Thank you for your advice, Claus.

(devel still fails due to the curl's "Connection timed out" error; I think I'm not lucky enough to let all 5 builds pass at the same time...)

@clauswilke
Copy link
Member

clauswilke commented Aug 12, 2018

You can restart just one particular failed build by clicking on the reload icon to the right of it. In this way, you won't mess up the builds that already succeeded.

I have a few comments/questions:

  1. Have you considered using lower case for x, y, z, m? That seems more in line with typical ggplot2 naming.
  2. Have you considered returning NAs for z and m when those values are not available? Would that make things overall more predictable?
  3. I don't fully understand the comment about errors when z and m are not available. Does my point 2 address this or is there a different type of error that can occur?

@yutannihilation
Copy link
Member Author

You can restart just one particular failed build by clicking on the reload icon to the right of it.

The icon doesn't show; it seems I don't have the permission, unfortunately.

@yutannihilation
Copy link
Member Author

(Disclaimer: I'm not a GIS expert but a guy who just wants to add labels on geom_sf(). So, I might be wrong, especially about technical phrasing...)

  1. Have you considered using lower case for x, y, z, m? That seems more in line with typical ggplot2 naming.

In sf package (and probably in simple feature access), they are always written as capital. So, I felt it's more natural to use capital letters for computed variables so that we can distinguish them from ggplot's coordinates. That said, I'm still wondering whether they are essentially the same thing, X and Y dimensions of simple feature access, and x and y axis of ggplot2...

  1. Have you considered returning NAs for z and m when those values are not available? Would that make things overall more predictable?

Thanks, this sounds nice.

  1. I don't fully understand the comment about errors when z and m are not available. Does my point 2 address this or is there a different type of error that can occur?

Different. As described above, StatSfCoordinates takes these two steps and the error occurs in the first step:

  1. Choose one point per geometry by some function like sf::st_centroid() and sf::st_point_on_surface().
  2. Retrieve coordinates from the points by sf::st_coordinates().

Your suggestion is about when the first step succeeds but doesn't return Z and M value, but what I was thinking is when the calculation itself fails.

For example, consider we have a sf object of single POLYGON feature and want to determine a location to plot a label:

library(ggplot2)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3

mat_poly_xy <- rbind(
  c(1, 1), c(-1, 1), c(-1, -1), c(1, -1), c(1, 1)
)
poly_xy <- st_polygon(list(mat_poly_xy))
poly_xy
#> POLYGON ((1 1, -1 1, -1 -1, 1 -1, 1 1))

In this case, we can choose a point on the polygon by st_point_on_surface() and extract the coordinates by st_coordinates().

d_poly_xy <- st_sf(geometry = st_sfc(poly_xy))

d_pt_xy <- st_point_on_surface(d_poly_xy)
xy <- st_coordinates(d_pt_xy)

So we can combine the coordinates with the original data

cbind(d_pt_xy, xy)
#> Simple feature collection with 1 feature and 2 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 0 xmax: 0 ymax: 0
#> epsg (SRID):    NA
#> proj4string:    NA
#>   X Y    geometry
#> 1 0 0 POINT (0 0)

But, if the polygon has M dimension, the function fails.

mat_poly_xym <- cbind(mat_poly_xy, 1)

poly_xym <- st_polygon(list(mat_poly_xym), dim = "XYM")

poly_xym
#> POLYGON M ((1 1 1, -1 1 1, -1 -1 1, 1 -1 1, 1 1 1))

d_poly_xym <- st_sf(geometry = st_sfc(poly_xym))

st_point_on_surface(d_poly_xym)
#> Error in CPL_geos_op("point_on_surface", x, numeric(0), integer(0), numeric(0), : GEOS does not support XYM or XYZM geometries; use st_zm() to drop M

@yutannihilation
Copy link
Member Author

yutannihilation commented Aug 12, 2018

the comment about errors when z and m are not available

Ah, in short, I commented about errors when Z and/or M are available, which is not always happy. Does my answer make sense?

@clauswilke
Copy link
Member

The Z and M issue still isn't entirely clear to me. Do I understand correctly that the default setup fails if the geometries contain Z or M coordinates? In this case, wouldn't it be better for the default to remove Z and M and then run sf::st_point_on_surface()? In other words, something like this:

#' @param fun.geometry
#'   A function that takes a `sfc` object and returns a `sfc_POINT` with the
#'   same length as the input. If `NULL`, `function(x) sf::st_point_on_surface(sf::st_zm(x))` will be
#'   used.

@yutannihilation
Copy link
Member Author

Do I understand correctly that the default setup fails if the geometries contain Z or M coordinates?

Generally, yes. (It may not fail depending on what fun.geometry you use, though.)

Thanks, function(x) sf::st_point_on_surface(sf::st_zm(x)) looks nice to me. For simplicity, now I'm thinking about supporting only X and Y, I mean:

  • always drop Z and M by default (using function(x) sf::st_point_on_surface(sf::st_zm(x)))
  • always return only X and Y
  • use the lower case x and y for computed variables.

Does this look better to you?

@yutannihilation
Copy link
Member Author

I have no idea why the Travis build still fails, but it seems unrelated to this PR.

@yutannihilation
Copy link
Member Author

Ah, do I have to wait for #2838 to be solved?

@clauswilke
Copy link
Member

I assume it's because of #2824. That needs to be resolved first.

@clauswilke
Copy link
Member

#2838 has been solved, #2824 not yet. #2824 is the problem.

@yutannihilation
Copy link
Member Author

I got it, Sorry that I don't keep up the discussion...

@clauswilke
Copy link
Member

#2824 is now resolved.

@yutannihilation
Copy link
Member Author

Yay, all checks passed!

@clauswilke
Copy link
Member

Looks good to me. @hadley, any concerns from your side?

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of documentation style issues, and needs a news bullet; otherwise looks good!

R/sf.R Outdated
@@ -6,6 +6,7 @@
#' an unusual geom because it will draw different geometric objects depending
#' on what simple features are present in the data: you can get points, lines,
#' or polygons.
#' For texts and labels, you can use `geom_sf_text` and `geom_sf_label`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text -> text; functions should have () after them

R/sf.R Outdated
#' @rdname ggsf
#' @inheritParams geom_label
#' @inheritParams stat_sf_coordinates
#' @seealso [stat_sf_coordinates()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this seealso belongs in the documentation block above.

#'
#' `stat_sf_coordinates()` extracts the coordinates from 'sf' objects and
#' summarises them to one pair of coordinates (x and y) per geometry. This is
#' convenient when you draw an sf object as geoms like texts and labels (so
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

texts -> text (English is weird)

@yutannihilation
Copy link
Member Author

Thanks! I was very anxious if my English is strange, so the comments are very helpful...

For a news bullet, should I wait for the release of v3.0.1?

@clauswilke
Copy link
Member

No, please add the news bullet now. Thanks!

@yutannihilation
Copy link
Member Author

Sure!

@yutannihilation
Copy link
Member Author

Added a bullet.

@yutannihilation
Copy link
Member Author

@clauswilke Could you merge this if this is OK? Then I can create another PR to improve the error message you commented :)

@clauswilke clauswilke merged commit cac3a95 into tidyverse:master Aug 24, 2018
@yutannihilation
Copy link
Member Author

Thanks!

@yutannihilation yutannihilation deleted the geom-sf-label branch August 24, 2018 13:00
@adrfantini
Copy link

This is an extremely useful feature. Thanks!!!

@clauswilke
Copy link
Member

The merge broke the build. Could you look into what's going wrong? I'm in meetings all day.

@yutannihilation
Copy link
Member Author

Hmm, sorry if this is my fault. I will investigate.

@yutannihilation
Copy link
Member Author

Now it's all green. I definitely saw "error" status, but I can't find the errored build in the history. What happened...?

https://travis-ci.org/tidyverse/ggplot2/builds

@yutannihilation
Copy link
Member Author

Did I deleted my branch too early?

@yutannihilation
Copy link
Member Author

Anyway, it seems alright now.

@lock
Copy link

lock bot commented Feb 20, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Feb 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stat for ploting sf data as label and text
4 participants