-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Implement default crs for non-sf objects in coord_sf(). #3659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I haven't figured out yet how to get the scale limits correct. I'll provide some notes as food for thought. Currently, the scale limits for geometry columns are calculated in Lines 6 to 14 in 16ed4d0
This trick works nicely if we're exclusively working in projected coordinates, but it doesn't if non-sf layers are in non-projected coordinates, as then the data units don't match. We could back-transform the bounding boxes to long-lat coordinates, but I'm worried that that would lead to strange limit choices in cases where meridians and parallels are not mostly aligned with the y and x axis, respectively. As much as possible, we should perform limit calculations in transformed coordinates. Since Lines 128 to 147 in 16ed4d0
I'm just not sure yet how to do this, in particular in a way that works in faceted graphs. On the flip side, free scales don't seem to be currently supported anyways, so maybe that's not a regression. library(ggplot2)
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
ggplot(nc[1:5, ]) + geom_sf() + facet_wrap(~NAME) ggplot(nc[1:5, ]) + geom_sf() + facet_wrap(~NAME, scales = "free")
#> Error: coord_sf doesn't support free scales Created on 2019-12-06 by the reprex package (v0.3.0) Maybe |
I think I've got limits working, but it required quite a bit of fiddling and communication back-and-forth between the stat and the coord. @paleolimbot Could you take a look? library(ggplot2)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
ggplot(nc) +
geom_sf() +
stat_sf_coordinates(geom = "point") +
coord_sf(crs = st_crs(3347)) ggplot(nc) +
stat_sf_coordinates(geom = "point") +
coord_sf(crs = st_crs(3347)) # scale limits are given in longitude/latitude
ggplot(nc) +
geom_sf() +
scale_x_continuous(limits = c(-86, -74)) +
coord_sf(crs = 3347) # coord limits are given in projected coordinates
ggplot(nc) +
geom_sf() +
coord_sf(xlim = c(6800000, 7900000), crs = 3347) Created on 2019-12-06 by the reprex package (v0.3.0) |
Some more testing of this code. First, I wanted to see whether it works in cases that span the international dateline. The answer is yes as long as an appropriate CRS is chosen. This is reasonable. library(tidyverse)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
world <- ne_countries(scale = "medium", returnclass = "sf")
fiji_polys <- world %>%
filter(name == "Fiji") %>%
pull(geometry) %>%
st_cast("POLYGON") %>%
st_sf() %>%
mutate(label = letters[1:n()])
# doesn't work, long-lat across the dateline causes problems
ggplot(data = fiji_polys) +
geom_sf() # works
ggplot(data = fiji_polys) +
geom_sf() +
coord_sf(crs = 3460) # works even with separate groups on either
# side of the date line
ggplot(data = fiji_polys) +
geom_sf(aes(fill = label)) +
coord_sf(crs = 3460) Created on 2019-12-06 by the reprex package (v0.3.0) Second, I wanted to see how coordinate expansion is affected by non-sf layers. The answer is that coordinate expansion can change depending on whether the default crs of non-sf layers is set equal to the primary crs or not. I assume there's no way around this. When the default crs is long-lat, then the x and y scales are trained on these values, and the resulting scale limits can be different from what we would choose in an appropriate projection. library(ggplot2)
library(patchwork)
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
p1 <- ggplot(nc) + geom_sf() +
coord_sf(crs = 3347)
p2 <- ggplot(nc) + geom_sf() +
coord_sf(crs = 3347, default_crs = 3347)
# when only sf layers are used, the panel expansion does not
# depend on the default crs
p1 / p2 # however, when there are non-sf layers also, the panel expansion
# depends on the default crs
(p1 + stat_sf_coordinates()) / (p2 + stat_sf_coordinates()) Created on 2019-12-06 by the reprex package (v0.3.0) |
I think this generally works, but there could be some implications for |
This looks awesome! The one limitation I forsee is that multiple panels will all be calling the same |
The alternative would be to have a |
We don't currently allow I like the idea of a |
I'll give it a shot and see if I can get it to work! |
Ok, I gave it a shot and while the library(scales)
library(ggplot2)
scale_geometry_continuous <- function() {
scale <- ggproto(
NULL, ScaleGeometryContinuous,
aesthetics = "geometry",
scale_name = "position_c"
)
}
scale_type.sfc <- function(x) "continuous"
RangeBbox <- ggproto("RangeBbox", NULL,
range = NULL,
target_crs = NULL,
train = function(self, x) {
if (all(sf::st_is_empty(x))) {
return()
}
if (is.null(self$range)) {
self$range <- list()
}
if (!is.null(self$target_crs) && !identical(sf::st_crs(x), sf::NA_crs_)) {
x <- sf::st_transform(x)
}
box <- sf::st_bbox(x)
self$range$xmin <- min(self$bbox$xmin, box["xmin"])
self$range$xmax <- max(self$bbox$xmax, box["xmax"])
self$range$ymin <- min(self$bbox$ymin, box["ymin"])
self$range$ymax <- max(self$bbox$ymax, box["ymax"])
},
reset = function(self) {
self$range <- NULL
}
)
ScaleGeometryContinuous <- ggproto("ScaleGeometryContinuous", ScaleContinuous,
range = ggproto(NULL, RangeBbox),
clone = function(self) {
new <- ggproto(NULL, self)
new$range <- ggproto(NULL, RangeBbox)
new
},
train = function(self, x) {
self$range$train(x)
},
map = function(x) x,
transform = function(x) x
)
nc_tiny_coords <- matrix(
c(-81.473, -81.741, -81.67, -81.345, -81.266, -81.24, -81.473,
36.234, 36.392, 36.59, 36.573, 36.437, 36.365, 36.234),
ncol = 2
)
nc <- sf::st_as_sf(
data.frame(
NAME = "ashe",
geometry = sf::st_sfc(sf::st_polygon(list(nc_tiny_coords)), crs = 4326)
)
)
geometry_scale <- scale_geometry_continuous()
geometry_scale$train(nc$geometry)
geometry_scale$range$range
#> $xmin
#> [1] -81.741
#>
#> $xmax
#> [1] -81.24
#>
#> $ymin
#> [1] 36.234
#>
#> $ymax
#> [1] 36.59 Created on 2019-12-08 by the reprex package (v0.2.1) |
Yes, that's what I was expecting was going to be problematic when I looked over the code yesterday. At some point, I think we'll have to rethink hard-coded position scales(*), but it's not something I want to tackle now. (*) For example, in some cases a third dimension would be useful, e.g. to control z-ordering of elements, or to natively implement something like ggtern. |
That's something I've also thought about...coordinate systems should be able to specify which aesthetics are "position" aesthetics? Outside the scope of this PR obviously, but it might be worth considering before the next release since I've already messed with the Coord internals. I could give it a shot but not for a few weeks. |
There is indeed a problem with writing changes back to the coord. If we reuse the coord in multiple plots, limits can get messed up. This could be addressed by cloning the coord when adding it, but I'll first want to see if there's a different approach. library(ggplot2)
library(patchwork)
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
# works as expected
p1 <- ggplot(nc[1, ]) + geom_sf() + coord_sf(crs = 3347)
p2 <- ggplot(nc[2, ]) + geom_sf() + coord_sf(crs = 3347)
p1 + p2 # doesn't work, limits in second plot are wrong
crd <- coord_sf(crs = 3347)
p1 <- ggplot(nc[1, ]) + geom_sf() + crd
p2 <- ggplot(nc[2, ]) + geom_sf() + crd
p1 + p2 Created on 2019-12-08 by the reprex package (v0.3.0) If we decide the coord needs to be cloned, this would have to happen here: Lines 139 to 149 in 48660e1
|
I was able to avoid interference between different copies of the same coord, by resetting internal parameters to a defined state whenever the plot build begins. library(ggplot2)
library(patchwork)
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
# parameters are regenerated with every plot build, no
# sharing of parameters between plots
crd <- coord_sf(crs = 3347)
p1 <- ggplot(nc[1, ]) + geom_sf() + crd
p2 <- ggplot(nc[2, ]) + geom_sf() + crd
p1 + p2 Created on 2019-12-08 by the reprex package (v0.3.0) |
calling its unique function
@edzer FYI, with the latest version of my code, if you set |
I've thought a bit more about this, and I think that this approach introduces considerable complexity for a problem that has a reasonable solution (project coordinates in advance or use something like |
@clauswilke would you mind writing a quick overview of the PR, updated based on the discussion so far? Otherwise it's going to be hard for me to find the time to read the whole thread. |
@hadley Sure. Give me a day or two. I'll also resolve the merge conflicts. I'll ping you when I'm ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though we should still wait for Hadley's review, I have no more review comments.
gee this is fab only just found it, thanks @clauswilke !!!!!!!!!! |
@hadley As promised, a summary of the discussion for this PR. I'll write this in a series of comments, so that I don't have to craft one massive comment all at once. First, what does this PR do? From the news item:
Here are some basic examples. We can now easily add points or text in long-lat coordinates. library(tidyverse)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
df <- data.frame(
lat = 35.76667,
long = -78.63333,
name = "Raleigh"
)
ggplot(df, aes(x = long, y = lat)) +
geom_sf(data = nc, size = 0.1, fill = "white", inherit.aes = FALSE) +
geom_point() +
geom_text(aes(label = name), hjust = -.1, vjust = 1.1) +
coord_sf(crs = 2264) The framework is general and extends to essentially any geom. Here is an example with library(ggforce)
df <- read_csv(file = "name,population,lat,long
Charlotte,827097,35.227,-80.843
Raleigh,451066,35.772,-78.639
Greensboro,285342,36.073,-79.792
Durham,257636,35.994,-78.899
Winston-Salem,241218,36.1,-80.244")
ggplot(df, aes(x = long, y = lat)) +
geom_sf(data = nc, size = 0.1, fill = "white", inherit.aes = FALSE) +
geom_point() +
geom_mark_hull() +
coord_sf(crs = 2264) And here is an example from ggplot() +
annotation_map(map_data("state"), fill = "antiquewhite", colour = "darkgrey") +
geom_sf(data = nc) +
coord_sf(crs = st_crs(3347)) The limits (both for scales and for the coord) are set in the default coordinate system, i.e., long-lat by default. ggplot() +
annotation_map(map_data("state"), fill = "antiquewhite", colour = "darkgrey") +
geom_sf(data = nc) +
coord_sf(crs = st_crs(3347), xlim = c(-82, -76), ylim = c(34, 37)) Created on 2020-03-08 by the reprex package (v0.3.0) |
While I developed this PR a number of issues came up, which I'll list here and then address one by one:
|
Point 1: Two different coordinate systems For geometry objects, we want to keep track of their correct limits in projected coordinates. But scales are now trained in default coordinates. Thus, we somehow have to keep track of projected limits outsides of the regular scales. I handled this by implementing a call-back to Ultimately, the correct way to handle this would be to implement generic position scales, so that we could add a geometry scale that can be trained on geometry objects. However, that solutions is out of reach at this time. |
Point 2: Appropriately segmentize straight lines. I'm segmentizing lines in the default coordinate system and then transform the segments. With the default setting, this generates horizontal and vertical lines that follow parallels and meridians, a behavior that many people might expect: library(tidyverse)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
ggplot(nc) +
geom_sf(size = 0.1, fill = "white") +
geom_hline(yintercept = 34:36) +
geom_vline(xintercept = c(-84, -80, -76)) +
coord_sf(crs = 2264) Created on 2020-03-08 by the reprex package (v0.3.0) However, @edzer pointed out that this is not always the correct approach, and that we might want to segmentize, for example, along great circles. Doing this is beyond the scope of this PR though (and likely beyond the scope of a simple ggplot2 coord), so I addressed this issue by pointing it out in the documentation: |
Point 3: Limits Specifying limits in default coordinates (either through the min/max values in the data, or via library(ggplot2)
tile_df <- expand.grid(
xmin = seq(-140, -52, by = 20),
ymin = seq(40, 70, by = 10)
)
tile_df$xmax <- tile_df$xmin + 20
tile_df$ymax <- tile_df$ymin + 10
tile_df$x <- (tile_df$xmin + tile_df$xmax) / 2
tile_df$y <- (tile_df$ymin + tile_df$ymax) / 2
tile_df$width <- 20
tile_df$height <- 10
p <- ggplot(
tile_df,
aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, x = x, y = y)
) + geom_tile(aes(height = 7.5, width = 5))
p + coord_sf(crs = 3979) # default limit method is "cross" p + coord_sf(crs = 3979, lims_method = "box") p + coord_sf(crs = 3979, lims_method = "orthogonal") Created on 2020-03-08 by the reprex package (v0.3.0) |
Point 4: Things can get wonky in weird coordinate systems The reason for why was already explained under Points 2 and 3. Here are a few more examples with limits: library(tidyverse)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
world <- ne_countries(scale = "medium", returnclass = "sf")
fiji_polys <- world %>%
filter(name == "Fiji") %>%
pull(geometry) %>%
st_cast("POLYGON") %>%
st_sf() %>%
mutate(label = letters[1:n()])
# doesn't work, long-lat across the dateline causes problems
ggplot(fiji_polys) +
geom_sf() # doesn't work, due to dateline boundary
ggplot(fiji_polys) +
geom_sf() +
coord_sf(crs = 3460) # setting `default_crs = NULL` always fixes things
ggplot(fiji_polys) +
geom_sf() +
coord_sf(crs = 3460, default_crs = NULL) # changing the limits method also works
ggplot(fiji_polys) +
geom_sf() +
coord_sf(crs = 3460, lims_method = "geometry_bbox") # works
ggplot(fiji_polys) +
geom_sf() +
coord_sf(crs = 3460, lims_method = "orthogonal") # works (not sure why)
ggplot(fiji_polys) +
geom_sf() +
coord_sf(crs = 3460, lims_method = "orthogonal") +
xlim(-170, 170) +
ylim(-22, -12) Created on 2020-03-08 by the reprex package (v0.3.0) And now an example with straight lines. These plots probably don't look the way we would want them to. library(ggplot2)
cities <- data.frame(
x = c(-63.58595, 116.41214, 13.50, -149.75),
y = c(44.64862, 40.19063, 52.51, 61.20),
city = c("Halifax", "Beijing", "Berlin", "Anchorage")
)
cities$xend <- cities$x[c(2, 4, 1, 3)]
cities$yend <- cities$y[c(2, 4, 1, 3)]
p <- ggplot(cities, aes(x, y, xend = xend, yend = yend)) +
geom_point() +
# view of the north pole
coord_sf(crs = 3995, lims_method = "box")
p + geom_segment() p +
geom_hline(
aes(yintercept = y, col = city)
) +
geom_vline(
aes(xintercept = x, lty = city)
) Created on 2020-03-08 by the reprex package (v0.3.0) However, surprisingly, the following works just fine. Note: this does not use library(ggplot2)
library(ggrepel)
cities <- data.frame(
x = c(-63.58595, 116.41214, 13.50, -149.75),
y = c(44.64862, 40.19063, 52.51, 61.20),
city = c("Halifax", "Beijing", "Berlin", "Anchorage")
)
ggplot(cities, aes(x, y)) +
annotation_map(
map_data("world"),
fill = "antiquewhite", colour = "darkgrey", size = 0.1
) +
geom_point() +
geom_text_repel(aes(label = city), box.padding = unit(20, "pt")) +
coord_sf(crs = 3995, lims_method = "box") Created on 2020-03-08 by the reprex package (v0.3.0) |
Point 5: Additional helper functions This point applies primarily to a function I needed to write that makes it easy to transform coordinates among different coordinate systems, without having to worry about creating and then unpacking sf objects. I called this function This is how it works: library(ggplot2)
# location of cities in NC by long (x) and lat (y)
data <- data.frame(
city = c("Charlotte", "Raleigh", "Greensboro"),
x = c(-80.843, -78.639, -79.792),
y = c(35.227, 35.772, 36.073)
)
# transform to projected coordinates
data_proj <- sf_transform_xy(data, 3347, 4326)
data_proj
#> city x y
#> 1 Charlotte 7275499 -60169.66
#> 2 Raleigh 7474260 44384.13
#> 3 Greensboro 7357835 57438.27
# transform back
sf_transform_xy(data_proj, 4326, 3347)
#> city x y
#> 1 Charlotte -80.843 35.227
#> 2 Raleigh -78.639 35.772
#> 3 Greensboro -79.792 36.073 Created on 2020-03-08 by the reprex package (v0.3.0) This function is not only useful for extension writers, it can also be useful to endusers who want to, e.g., calculate plot limits in projected coordinates. However, it is somewhat outside the normal scope of what ggplot2 does. So, two questions:
This was the last comment in this series. I have addressed everything I could think of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation — your reasoning sounds solid to me, so I think I think it makes sense to merge this PR. Exporting sf_transform_xy()
seems like a reasonable compromise.
Given that the documentation for geom_sf()
/coord_sf()
is getting so long, we may want to consider a vignette/article in the future.
Thanks for all your hard work on this!
Thanks! I think I'll wait merging this until after the 3.3.1 release. It's too much of a change for a minor release. I agree with the documentation issues. I've also thought about breaking the man page into two, one for |
This is now merged. Thanks to everybody for the extensive feedback. Let's see how this plays out as people start testing this in the wild. |
Second attempt at implementing the idea proposed in #3654. Based on discussion in #3655.
The main issue that remains to be solved is proper setting of x and y scale limits.
Created on 2019-12-06 by the reprex package (v0.3.0)