-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expose ways of iterating over vectors of s2 geometries? #125
Comments
http://s2geometry.io/devguide/cpp/quickstart suggests that |
First, I think might find Second, while it will probably be a long time before it's stable enough to be public, I've started a C API for internal use ( https://github.com/r-spatial/s2/blob/master/src/s2-c-api.cpp ). If S2 is going to gain traction it needs drop-in GEOS functions. Right now I'm just poking away at that as I run into things I need to implement in C...if you have ideas for things you'd like to see there I'd love ideas. |
Looks like it! Here is an example use: library(sf)
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(s2)
up = function(n) {
x = runif(n)
y = runif(n)
st_as_sf(data.frame(x = x, y = y), coords = c("x", "y"))
}
u1 = up(1000)
s2 = st_as_s2(u1)
s2_closest_edges(s2[1:10], s2, k = 3)
# [[1]]
# [1] 28 85 1
# [[2]]
# [1] 688 406 2
# [[3]]
# [1] 57 958 3
# [[4]]
# [1] 760 975 4
# [[5]]
# [1] 586 224 5
# [[6]]
# [1] 429 661 6
# [[7]]
# [1] 795 664 7
# [[8]]
# [1] 205 75 8
# [[9]]
# [1] 274 502 9
# [[10]]
# [1] 370 487 10 |
Thanks, implemented and pushed to github. With prototypes in place, I'll try to check timings (which look OK for distance and k-nearest neighbours, not OK for finding the distances for known neighbours - will look at multicore as a way round) and differences from the legacy code. |
If you give me a reprex of what is slow I can look into how to make it faster. Seems like a common/reasonable use case to make sure s2 can handle! |
Thanks for being willing to take a look. This is an implicit reprex via
and maybe https://github.com/r-spatial/spdep/blob/531b0d5e830653bb04197780f1cf4ffd1937aec7/R/nbdists.R#L61-L66 with a list of indices passed to |
I think you can get away with a single library(s2)
n <- 1e3
k <- 4
s2x <- as_s2_geography(s2_point(runif(n, -1, 1), runif(n, -1, 1), runif(n, -1, 1)))
nb <- s2_closest_edges(s2x, s2x, k + 1, min_distance = 0)
d1 <- function() {
lapply(seq_along(nb), function(i) {
s2::s2_distance(s2x[i], s2x[nb[[i]]])
})
}
d2 <- function() {
# you can abbreviate this if you know all the lengths are the same
nb_flat_len <- vapply(nb, length, integer(1))
dist_unnest <- s2::s2_distance(
vctrs::vec_rep_each(s2x, nb_flat_len),
s2x[unlist(nb)]
)
# maybe a faster way to do this nesting
di_end <- cumsum(nb_flat_len)
di_start <- c(1L, di_end[-length(di_end)] + 1L)
d2 <- Map("[", list(dist_unnest), Map(":", di_start, di_end))
}
identical(d1(), d2())
#> [1] TRUE
bench::mark(d1(), d2())
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 d1() 93.24ms 101.96ms 9.94 6.38MB 17.9
#> 2 d2() 5.96ms 6.47ms 143. 529.28KB 11.9 Created on 2021-06-13 by the reprex package (v0.3.0) |
Thanks! This is my 1970s Algol/Fortran playbook! A longer vector with start/end points - it should work fine. I'll need to work around length 1 neighbour vectors with 0L signalling no-neighbours on the |
I have something of a puzzle:
s2-based |
I've also found As a workaround, I've used |
Thanks. The case here is, however, finding neighbours within |
I think there's an easy option for that! https://github.com/r-spatial/s2/blob/master/src/s2-matrix.cpp#L175 Can be updated to include https://github.com/r-spatial/s2/blob/master/inst/include/s2/s2closest_edge_query.h#L135 |
It might be too late to put this in version 1.0.6 (I'm running revdep checks now), but it will certainly make the next release. |
It's probably also worth checking if there's a faster way of doing the current dwithin, which should be much faster than it is. Lines 134 to 136 in 1162a25
|
OK - I'll experiment with k-nearest neighbours, but that could be messy with this data set, there are very many distance neighbours in the urban areas compared to the longest distance between first nearest neighbours. I'll also look at buffering and intersecting, which seemed wasteful on a small data set but may use indexing. |
With
|
The buffering can be speeded up by reducing |
Not sure: I believe that the cells are guaranteed to include the actual (smooth) buffer, so less cells = larger area. |
Yes, the buffered area is the exterior covering as currently implemented. I'm planning on implementing an interior covering as well when there's a vector class for an S2CellUnion to match the vector class for S2Cell. This should reduce memory usage for this technique considerably! |
Yes, larger area in the buffer seems to mean that points are included by intersection with the buffer that are more than dist from the point of interest, doesn't it? |
Yes...you'll have to refilter the points using |
Yes, this looks possible:
For 71k points, max_cells 500 was 450s, 250 220s, with the intersects_matrix and dwithin inside lapply not varying. Can intersects_matrix or buffer_cells be parallelized, I think so? max_cells 150 is 115s for buffer, intersects increasing to 165s, lapply/dwithin 63s, so multicore looks feasible with maybe max_cells 200. |
Perhaps wait a tiny bit for me to catch up on the additions I mentioned above...the cell union in particular I think will make most of this a mute point! At the very least worth trying before delving into the complexities of multicore. |
In the embarrassingly parallel cases,
Both knn are far faster than legacy; distances between neighbours are also much faster. So now there is a clear basis for testing changes in s2 - these were run on the repo state as of this morning CEST. Using the input 71k tract boundaries single core and guiding legacy snapped contiguity detection by sf intersects between bounding boxes to establish candidate lists, with NA crs, 25s. Maybe I should revisit this and assign back an input GEOGCRS before doing the |
Following up and reading r-spatial/discuss#43, and passing Reprex:
The 6th bounding box does not intersect the 1st bounding box when longlat crs, does when NA crs.
Is this just the s2_model - I've tried most of |
I'll look properly in a bit, but could it be that the edges of the bounding box are being interpreted as geodesic edges (the case in s2)? You could try a segmentize with NA crs before the intersection to test. |
Thanks. Further, these are the bounding box corner points: I tried |
I looked more carefully, and if no bounding box intersection is used, just |
That's good to know - it makes sense that cartesian bounding boxes are unlikely to play nicely with s2. An s2 cell union (e.g., |
I've opened up some issues for the action items in here...open up other issues if I missed any! |
I've started looking at supplementing code for finding point neighbours within distance thresholds on the sphere in spdep through
sf::st_is_within_distance()
, which is OK. However, to return distances for the chosen neighbours on the sphere, in line 64 in https://github.com/r-spatial/spdep/blob/master/R/nbdists.R I iterate over subsets of points throughlapply()
- see also the examples at the foot of https://r-spatial.github.io/spdep/reference/dnearneigh.html - it would be better to iterate in compiled code rather than uselapply()
. IsLinkingTo
feasible?Finding k-nearest neighbours on the sphere looks even more complex for k > 1. Is there any such query in the s2 code itself? I looked at the documentation but couldn't see obvious opportunities, though I think they may be there somewhere, given that indexing is being used.
Current implementations use legacy C code for
spdep::dnearneigh()
,spdep::nbdists()
andspdep::knearneigh()
with Great Circle WGS84 ellipsoid distances and no indexing.Any suggestions would be very welcome!
The text was updated successfully, but these errors were encountered: