Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multi-Geometry support to point_linestring_distance and build python bindings #660

Merged

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Aug 23, 2022

Description

This PR adds multi-point and multi-linestring support to point_linestring_distance. The c++ API always supports multi-variant of the geometry type, since any single-geometry type can be wrapped with a counting_iteartor for its underlying geometry offset.

The column API accepts a std::optional for the geometry offset inputs. If not provided, they are considered single geometry by default. A multigeom_dispatcher is used to generalize the runtime to compile optional information.

This PR also builds python bindings for point_linestring_distance. First, the cython bindings is created with a backported optional module from cython 3.0 to support the above std::optional API. The cython APIs also made use of the dynamic typing property of python to allow geometry offsets to be optional arguments. The python API is the first examplar of a computing API accepting GeoSeries as input. For simplicity, we assume GeoSeries only contains single type geometry and there is no mixing of points and multipoints.

Contributes to #231
Follow up to #573

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

isVoid and others added 30 commits June 14, 2022 10:52
…into feature/header_only_point_linestring_distance
Co-authored-by: Mark Harris <mharris@nvidia.com>
…b.com:isVoid/cuspatial into feature/header_only_point_linestring_distance
…into feature/header_only_point_linestring_distance
@isVoid isVoid requested a review from harrism August 25, 2022 00:31
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

cpp/include/cuspatial/detail/iterator.hpp Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/spatial/distance.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/spatial/distance.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/spatial/distance.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/spatial/distance.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/spatial/distance.py Outdated Show resolved Hide resolved
@harrism harrism added the 4 - Needs Reviewer Waiting for reviewer to review or respond label Sep 5, 2022
OffsetIteratorB linestring_geometry_offset_first,
OffsetIteratorB linestring_geometry_offset_last,
OffsetIteratorC linestring_part_offsets_first,
OffsetIteratorC linestring_part_offsets_last,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an entirely theoretical suggestion that relates to all the other multi arrays, particularly MultiLinestring and MultiPolygon. It seems like, instead of using an iterator for the linestring_part_offsets, and another iterator for the linestring_geometry_offsets, that we should have an iterator for the linestring_geometry_offsets which also contains the iterator for the parts. We need fewer arguments and the ability to iterate through "all linestrings and linestring parts" is encapsulated into a single iterator. A theoretical suggestion, I don't know how to do it, possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you are just looking at a recursive templated iterator collection, there's some maximum level that c++ compiler allows the templates to be recursively nested within itself, but multilinestring should be well within that limit.

Does this make sense to you? https://godbolt.org/z/v95jn5G3e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's more or less exactly what I was thinking of. Do you think it works in the context of your PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really is just my first attempt to take a stab at the idea. I'd like to have some more feedbacks before we decide on the iterator collection interface. So no, not planning to put this in this PR.

cc @harrism

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be other simpler solutions, like just a MultiLinestringIteratorCollection class for example:

template<typename OffsetItA, typename OffsetItB, typename Vec2dIt>
struct MultiLinestringIteratorCollection
{
  OffsetItA geometry_offset_begin;
  OffsetItB part_offset_begin;
  Vec2dIt coordinate_begin;
};

Copy link
Contributor Author

@isVoid isVoid Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deviates a little bit from you original suggestion of having a custom iterator to represent a nested of iterator. I believe you meant to create a custom iterator that iterates over the multilinestring array, when you dereference this iterator, it returns a multilinestring - which in turn has its own iterator, and allows you to iterate into each linestring, segment and points etc. Besides, we should provide different level of iterators so that developer can pick their use-case defined parallel model as needed.

I don't have a working example, but something like:

class linestring {
    class iterator{
        std::pair<vec2d, vec2d> operator* (); // returns two end points of the line segment
        iterator operator+(diff_t i);
        iterator operator-(diff_t i);
        // other requirements of of LRAIs
    };
    iterator begin();
    iterator end();
};

class multi_linestring
{
    class linestring_iterator
    {
        linestring operator*(); // this returns a linestring
        // LRAI requirements
    };
    class segment_iterator
    {
        std::pair<vec2d, vec2d> operator* (); // returns two end points of the line segment
        // LRAI requirments
    }

    linestring_iterator linestring_begin();
    linestring_iterator linestring_end();

    segment_iterator segment_begin();
    segment_iterator segment_end();

    // The default should be the immediate next level of indirection
    auto begin() { return linestring_begin(); }
    auto end() { return linestring_end(); }
};

class multi_linestring_array
{
    class multi_linestring_iterator
    {
        multi_linestring operator*();
        // other LRAI requirements
    }
    class linestring_iterator { /* same as above */ }
    class segment_iterator { /* same as above */ }

    multilinestring_iterator multilinestring_begin();
    multilinestring_iterator multilinestring_end();

    linestring_iterator linestring_begin();
    linestring_iterator linestring_end();

    segment_iterator segment_begin();
    segment_iterator segment_end();

    // The default should be the immediate next level of indirection
    auto begin() { return multilinestring_begin(); }
    auto end() { return multilinestring_end(); }
};

But beware of the overheads you would run into when constructing these data structure per threads.

Co-authored-by: H. Thomson Comer <thomcom@gmail.com>
Copy link
Contributor

@thomcom thomcom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks really great. Just a few comments, questions, and textual updates.

points_it + num_points,
linestring_parts_it_first,
linestring_part_offsets.begin(),
linestring_part_offsets.end(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back on this idea, using a single function with optionals to maybe pass optionals down further in the call chain seems unoptimal to me. Personally I'd write a pairwise_point_linestring_distance_impl that takes non-multi inputs and creates a simple multi column, then calls pairwise_multipoint_multilinestring_distance_impl. Thinking it through though I think I can see why you did this - because you only need one function to represent all of the parameterizations of multi/non-multi on points and linestrings, instead of four. I'm ok with it as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the motivation is to avoid writing 4 functions. I'm leveraging c++'s template mechanism to help me do that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Will leave this discussion open, should not block this PR to merge)

I might have missed something you suggested here. Are you suggesting that these APIs should take a LIST type columns?

cpp/src/spatial/point_linestring_distance.cu Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Show resolved Hide resolved

Notes
-----
The input `GeoSeries` must contain a single type geometry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The input `GeoSeries` must contain a single type geometry.
The input `GeoSeries` should contain a single type geometry.

I don't think there's a problem passing in a GeoSeries that has other feature types in it, they'll just be ignored. Theoretically if a GeoSeries contained an equal number of points and linestrings it could be the first and second argument to this function? I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of a pairwise distance function should be of the same length as its input. I would rather the user explicitly extract the points and linestring from their geoseries to form a new one before passing in this function. Do we have an easy way for user to do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. In geopandas, the syntax commonly used is points = geoseries[geoseries.type == "Point"] and we need to implement some kind of the .type function in order to duplicate it. It can be hacked by using the GeoColumn._meta but we should just write the .type accessor. :)

python/cuspatial/cuspatial/utils/column_utils.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/utils/column_utils.py Outdated Show resolved Hide resolved
isVoid and others added 6 commits September 8, 2022 14:02
Co-authored-by: H. Thomson Comer <thomcom@gmail.com>
Co-authored-by: H. Thomson Comer <thomcom@gmail.com>
…:isVoid/cuspatial into feature/python_multi_linestring_distance
@isVoid
Copy link
Contributor Author

isVoid commented Sep 13, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 85c3150 into rapidsai:branch-22.10 Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Needs Reviewer Waiting for reviewer to review or respond breaking Breaking change cmake Related to CMake code or build configuration feature request New feature or request libcuspatial Relates to the cuSpatial C++ library Python Related to Python code
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants