Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s2_dwithin() and s2_dwithin_matrix() are slow #157

Closed
paleolimbot opened this issue Dec 31, 2021 · 2 comments · Fixed by #174
Closed

s2_dwithin() and s2_dwithin_matrix() are slow #157

paleolimbot opened this issue Dec 31, 2021 · 2 comments · Fixed by #174

Comments

@paleolimbot
Copy link
Collaborator

As noted here: #125 (comment)

Where the matrix version is implemented: https://github.com/r-spatial/s2/blob/main/src/s2-matrix.cpp#L433-L448

I wonder if part of the performance hit here is because building a mutable shape index for a point over and over again has a lot of overhead, and this function probably most of its action on points. The s2_cell() vector class might be able to help with this; properly vectorizing s2_latlng() as a record-style vector and providing specific implementations is also probably helpful.

It also may be that there is a more effective way to index y, or to use a single query and iterate through the edges (as opposed to creating a query for each pair of features). "iterating through the edges" for points is probably really fast (since each point is one edge); I don't know if this will be a performance hit for big polygons where there may be many edges from the same feature within some distance.

@paleolimbot
Copy link
Collaborator Author

Worth noting that preselection using s2_buffer() might be helpful!

@paleolimbot
Copy link
Collaborator Author

Reprex to play with:

library(s2)

points <- s2_point(runif(1e4, -1, 1), runif(1e4, -1, 1), runif(1e4, -1, 1))
geog <- s2::as_s2_geography(s2::as_s2_lnglat(points))
countries <- s2_data_countries()

grid_items <- expand.grid(geog = geog, countries = countries)

system.time(result <- s2_dwithin_matrix(countries, geog, 1e6))
#>    user  system elapsed 
#>   3.093   0.010   3.104
system.time(result <- s2_dwithin_matrix(geog, countries, 1e6))
#>    user  system elapsed 
#>   2.995   0.005   3.001

system.time(
  result <- s2_dwithin(
    grid_items$geog,
    grid_items$countries,
    1e6
  )
)
#>    user  system elapsed 
#>   3.203   0.019   3.224


system.time(
  result <- s2_dwithin(
    grid_items$countries,
    grid_items$geog,
    1e6
  )
)
#>    user  system elapsed 
#>   3.115   0.020   3.208

xy <- geos::geos_make_point(s2::s2_x(geog), s2::s2_y(geog))
countries2 <- geos::as_geos_geometry(wk::as_wkb(countries))
wk::wk_crs(countries2) <- NULL

grid_items2 <- expand.grid(xy = xy, countries2 = countries2)

system.time(
  result2 <- geos::geos_is_within_distance(
    grid_items2$xy, 
    grid_items2$countries,
    0.1
  )
)
#>    user  system elapsed 
#>   0.113   0.008   0.125

system.time(
  result2 <- geos::geos_is_within_distance(
    grid_items2$countries, 
    grid_items2$xy,
    0.1
  )
)
#>    user  system elapsed 
#>   0.114   0.005   0.122

system.time(
  result2 <- geos::geos_prepared_is_within_distance(
    grid_items2$xy, 
    grid_items2$countries,
    0.1
  )
)
#>    user  system elapsed 
#>   0.117   0.005   0.123

system.time(
  result2 <- geos::geos_prepared_is_within_distance(
    grid_items2$countries, 
    grid_items2$xy,
    0.1
  )
)
#>    user  system elapsed 
#>   0.115   0.005   0.120

It looks like this should be much faster than it is....I bet we can bake the preselection by cell buffering straight into dwithin to make this better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant