Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geoseries.distance #1231

Merged
merged 7 commits into from
Jul 31, 2023
Merged

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Jul 26, 2023

Description

closes #759
This PR adds geoseries.distance, computing distances between two geoseries.

Benchmarking distance API is a complicated task. Below I present the benchmark of a simplest case: distance between a pair of point geoseries. The benchmark is run with geoseries.distance() API. Data setup is not counted. The input consists of 1 point per row and varies between 1K to 10M with a increment step of 10 fold. Benchmarked time is converted to throughput (points / s). In this benchmark, When input is between 10K to 100K points, geopandas performance is better than cuspatial. For larger data sizes, cuspatial can have ~10X increase in throughput to geopandas. The benchmark is ran with align={True, False}. One of the operands is constructed with a reversed index to the other. When run with align==True, both cuspatial and geopandas sees performance decrease and is consistent with data sizes.

image

TODO:

  • Support distance to a single shapely object.
  • Benchmark against geopandas.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@isVoid isVoid requested a review from a team as a code owner July 26, 2023 06:33
@isVoid isVoid requested review from trxcllnt and thomcom July 26, 2023 06:33
@github-actions github-actions bot added the Python Related to Python code label Jul 26, 2023
@isVoid isVoid marked this pull request as draft July 26, 2023 06:34
@isVoid isVoid self-assigned this Jul 26, 2023
@isVoid isVoid added feature request New feature or request non-breaking Non-breaking change labels Jul 26, 2023
@isVoid isVoid marked this pull request as ready for review July 26, 2023 11:18
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor comments. Looks good. Congrats on finishing this feature set!


# Rows with misaligned indices contains nan. Here we scatter the
# distance values to the correct indices.
res = full(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does full do? What does res mean? If it's result, no need to abbreviate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
Copy link
Contributor

@thomcom thomcom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice and clean way to generate a lot of tests and a lot of dispatch functions.

@isVoid
Copy link
Contributor Author

isVoid commented Jul 31, 2023

/merge

@rapids-bot rapids-bot bot merged commit 8525f6b into rapidsai:branch-23.08 Jul 31, 2023
51 checks passed
@isVoid isVoid mentioned this pull request Jul 31, 2023
12 tasks
@isVoid isVoid mentioned this pull request Oct 5, 2023
12 tasks
rapids-bot bot pushed a commit that referenced this pull request Oct 5, 2023
closes #994 
There are primitive benchmark results for `GeoSeries.distance` in #1231. This PR plans to add more benchmark coverage:
TODO:
- [x] point-point
- [x] point-linestring
- [x] point-polygon
- [x] linestring-linestring
- [x] linestring-polygon
- [x] polygon-polygon
- [x] Geometry complexity dimension
- [x] Geometry spatial relationship dimension
- [ ] Write the blog

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - Mark Harris (https://github.com/harrism)

URL: #1277
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change Python Related to Python code
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

[FEA] Support GeoSeries.distance
3 participants