-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Multi-Geometry support to point_linestring_distance
and build python bindings
#660
Add Multi-Geometry support to point_linestring_distance
and build python bindings
#660
Conversation
…_linestring_distance
This reverts commit a8ec5ff.
This reverts commit 2ed60eb.
…into feature/header_only_point_linestring_distance
Co-authored-by: Mark Harris <mharris@nvidia.com>
…b.com:isVoid/cuspatial into feature/header_only_point_linestring_distance
…into feature/header_only_point_linestring_distance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
cpp/include/cuspatial/experimental/point_linestring_distance.cuh
Outdated
Show resolved
Hide resolved
cpp/include/cuspatial/experimental/point_linestring_distance.cuh
Outdated
Show resolved
Hide resolved
cpp/include/cuspatial/experimental/point_linestring_distance.cuh
Outdated
Show resolved
Hide resolved
Co-authored-by: Mark Harris <mharris@nvidia.com>
…into feature/python_multi_linestring_distance
OffsetIteratorB linestring_geometry_offset_first, | ||
OffsetIteratorB linestring_geometry_offset_last, | ||
OffsetIteratorC linestring_part_offsets_first, | ||
OffsetIteratorC linestring_part_offsets_last, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an entirely theoretical suggestion that relates to all the other multi arrays, particularly MultiLinestring
and MultiPolygon
. It seems like, instead of using an iterator for the linestring_part_offsets
, and another iterator for the linestring_geometry_offsets
, that we should have an iterator for the linestring_geometry_offsets
which also contains the iterator for the parts
. We need fewer arguments and the ability to iterate through "all linestrings and linestring parts" is encapsulated into a single iterator. A theoretical suggestion, I don't know how to do it, possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you are just looking at a recursive templated iterator collection, there's some maximum level that c++ compiler allows the templates to be recursively nested within itself, but multilinestring should be well within that limit.
Does this make sense to you? https://godbolt.org/z/v95jn5G3e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's more or less exactly what I was thinking of. Do you think it works in the context of your PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really is just my first attempt to take a stab at the idea. I'd like to have some more feedbacks before we decide on the iterator collection interface. So no, not planning to put this in this PR.
cc @harrism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be other simpler solutions, like just a MultiLinestringIteratorCollection
class for example:
template<typename OffsetItA, typename OffsetItB, typename Vec2dIt>
struct MultiLinestringIteratorCollection
{
OffsetItA geometry_offset_begin;
OffsetItB part_offset_begin;
Vec2dIt coordinate_begin;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deviates a little bit from you original suggestion of having a custom iterator to represent a nested of iterator. I believe you meant to create a custom iterator that iterates over the multilinestring array, when you dereference this iterator, it returns a multilinestring - which in turn has its own iterator, and allows you to iterate into each linestring, segment and points etc. Besides, we should provide different level of iterators so that developer can pick their use-case defined parallel model as needed.
I don't have a working example, but something like:
class linestring {
class iterator{
std::pair<vec2d, vec2d> operator* (); // returns two end points of the line segment
iterator operator+(diff_t i);
iterator operator-(diff_t i);
// other requirements of of LRAIs
};
iterator begin();
iterator end();
};
class multi_linestring
{
class linestring_iterator
{
linestring operator*(); // this returns a linestring
// LRAI requirements
};
class segment_iterator
{
std::pair<vec2d, vec2d> operator* (); // returns two end points of the line segment
// LRAI requirments
}
linestring_iterator linestring_begin();
linestring_iterator linestring_end();
segment_iterator segment_begin();
segment_iterator segment_end();
// The default should be the immediate next level of indirection
auto begin() { return linestring_begin(); }
auto end() { return linestring_end(); }
};
class multi_linestring_array
{
class multi_linestring_iterator
{
multi_linestring operator*();
// other LRAI requirements
}
class linestring_iterator { /* same as above */ }
class segment_iterator { /* same as above */ }
multilinestring_iterator multilinestring_begin();
multilinestring_iterator multilinestring_end();
linestring_iterator linestring_begin();
linestring_iterator linestring_end();
segment_iterator segment_begin();
segment_iterator segment_end();
// The default should be the immediate next level of indirection
auto begin() { return multilinestring_begin(); }
auto end() { return multilinestring_end(); }
};
But beware of the overheads you would run into when constructing these data structure per threads.
Co-authored-by: H. Thomson Comer <thomcom@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks really great. Just a few comments, questions, and textual updates.
cpp/include/cuspatial/experimental/point_linestring_distance.cuh
Outdated
Show resolved
Hide resolved
cpp/include/cuspatial/experimental/point_linestring_distance.cuh
Outdated
Show resolved
Hide resolved
cpp/include/cuspatial/experimental/point_linestring_distance.cuh
Outdated
Show resolved
Hide resolved
points_it + num_points, | ||
linestring_parts_it_first, | ||
linestring_part_offsets.begin(), | ||
linestring_part_offsets.end(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back on this idea, using a single function with optionals
to maybe pass optionals
down further in the call chain seems unoptimal to me. Personally I'd write a pairwise_point_linestring_distance_impl
that takes non-multi inputs and creates a simple multi column, then calls pairwise_multipoint_multilinestring_distance_impl
. Thinking it through though I think I can see why you did this - because you only need one function to represent all of the parameterizations of multi/non-multi on points and linestrings, instead of four. I'm ok with it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the motivation is to avoid writing 4 functions. I'm leveraging c++'s template mechanism to help me do that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Will leave this discussion open, should not block this PR to merge)
I might have missed something you suggested here. Are you suggesting that these APIs should take a LIST
type columns?
|
||
Notes | ||
----- | ||
The input `GeoSeries` must contain a single type geometry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input `GeoSeries` must contain a single type geometry. | |
The input `GeoSeries` should contain a single type geometry. |
I don't think there's a problem passing in a GeoSeries that has other feature types in it, they'll just be ignored. Theoretically if a GeoSeries contained an equal number of points and linestrings it could be the first and second argument to this function? I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of a pairwise distance function should be of the same length as its input. I would rather the user explicitly extract the points and linestring from their geoseries to form a new one before passing in this function. Do we have an easy way for user to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. In geopandas
, the syntax commonly used is points = geoseries[geoseries.type == "Point"]
and we need to implement some kind of the .type
function in order to duplicate it. It can be hacked by using the GeoColumn._meta
but we should just write the .type
accessor. :)
Co-authored-by: H. Thomson Comer <thomcom@gmail.com>
Co-authored-by: H. Thomson Comer <thomcom@gmail.com>
…:isVoid/cuspatial into feature/python_multi_linestring_distance
@gpucibot merge |
Description
This PR adds multi-point and multi-linestring support to
point_linestring_distance
. The c++ API always supports multi-variant of the geometry type, since any single-geometry type can be wrapped with acounting_iteartor
for its underlying geometry offset.The column API accepts a
std::optional
for the geometry offset inputs. If not provided, they are considered single geometry by default. Amultigeom_dispatcher
is used to generalize the runtime to compile optional information.This PR also builds python bindings for
point_linestring_distance
. First, the cython bindings is created with a backportedoptional
module from cython 3.0 to support the abovestd::optional
API. The cython APIs also made use of the dynamic typing property of python to allow geometry offsets to be optional arguments. The python API is the first examplar of a computing API accepting GeoSeries as input. For simplicity, we assumeGeoSeries
only contains single type geometry and there is no mixing ofpoints
andmultipoints
.Contributes to #231
Follow up to #573
Checklist