-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add geoseries.distance
#1231
Add geoseries.distance
#1231
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor comments. Looks good. Congrats on finishing this feature set!
|
||
# Rows with misaligned indices contains nan. Here we scatter the | ||
# distance values to the correct indices. | ||
res = full( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does full
do? What does res
mean? If it's result
, no need to abbreviate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's essentially an allocate and fill with specified value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice and clean way to generate a lot of tests and a lot of dispatch functions.
/merge |
closes #994 There are primitive benchmark results for `GeoSeries.distance` in #1231. This PR plans to add more benchmark coverage: TODO: - [x] point-point - [x] point-linestring - [x] point-polygon - [x] linestring-linestring - [x] linestring-polygon - [x] polygon-polygon - [x] Geometry complexity dimension - [x] Geometry spatial relationship dimension - [ ] Write the blog Authors: - Michael Wang (https://github.com/isVoid) Approvers: - Mark Harris (https://github.com/harrism) URL: #1277
Description
closes #759
This PR adds
geoseries.distance
, computing distances between two geoseries.Benchmarking distance API is a complicated task. Below I present the benchmark of a simplest case: distance between a pair of point geoseries. The benchmark is run with
geoseries.distance()
API. Data setup is not counted. The input consists of 1 point per row and varies between 1K to 10M with a increment step of 10 fold. Benchmarked time is converted to throughput (points / s). In this benchmark, When input is between 10K to 100K points, geopandas performance is better than cuspatial. For larger data sizes, cuspatial can have ~10X increase in throughput to geopandas. The benchmark is ran withalign={True, False}
. One of the operands is constructed with a reversed index to the other. When run withalign==True
, both cuspatial and geopandas sees performance decrease and is consistent with data sizes.TODO:
Checklist