Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement geom_equals and binary predicates that depend only on it. #926

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
67a2a6a
Start refactoring binops to handle various ops and inputs.
thomcom Dec 1, 2022
3038253
Abstraction working with binops that depend on pip.
thomcom Dec 1, 2022
d7a842b
Clean it up in GeoSeries.
thomcom Dec 1, 2022
3301b95
Correct abstraction for pip_only tests.
thomcom Dec 1, 2022
f6688d7
Support the seven extra pip tests.
thomcom Dec 2, 2022
77b6858
Support all supportable pip calls.
thomcom Dec 2, 2022
c69d2e6
Remove unused abstraction.
thomcom Dec 2, 2022
de923c5
Clean up binops.intersects docs.
thomcom Dec 2, 2022
67b0df6
Pass all equals tests except one.
thomcom Dec 2, 2022
b4ea5e5
Merge
thomcom Dec 5, 2022
305f937
Finish abstracting pip based ops.
thomcom Dec 5, 2022
707217b
Pass all multi tests.
thomcom Dec 5, 2022
376833a
Add equals only binops.
thomcom Dec 7, 2022
f0d0c46
Rename binops to binpreds.
thomcom Dec 8, 2022
441a585
Cleaning up based on review.
thomcom Dec 8, 2022
e02a669
Cleaned up unneeded future material.
thomcom Dec 8, 2022
6444a4c
Resolve has_same_dimension naming issue.
thomcom Dec 8, 2022
29870d4
Fix postprocess docs.
thomcom Dec 9, 2022
c2cc722
Factor out op string argument and move logic to the __call__ function…
thomcom Dec 9, 2022
18631e3
Fix bug with complete overlap
thomcom Dec 9, 2022
a1ab8a7
Merge new pip only binpreds refactor into equals.
thomcom Dec 9, 2022
023247e
Pass all equals only tests with new factoring.
thomcom Dec 9, 2022
3ddf7c4
Fix contains issue.
thomcom Dec 9, 2022
82f40ec
Update python/cuspatial/cuspatial/core/binpreds/binpreds.py
thomcom Dec 12, 2022
b71f34f
Handle review comments from @isVoid.
thomcom Dec 12, 2022
5338a73
Remove more equals code.
thomcom Dec 12, 2022
af5ae69
Merge branch 'feature/all_pip_operations' into feature/all_equals_ope…
thomcom Dec 12, 2022
34b3fbf
Update python/cuspatial/cuspatial/core/geoseries.py
thomcom Dec 13, 2022
7be5a73
Handle review comments from @harrism
thomcom Dec 13, 2022
230b0b7
Merge branch 'feature/all_equals_operations' of github.com:thomcom/cu…
thomcom Dec 13, 2022
e2a3616
Merge branch 'branch-23.02' into feature/all_equals_operations
thomcom Dec 13, 2022
04cd58d
Update python/cuspatial/cuspatial/core/geoseries.py
thomcom Dec 13, 2022
c9211f3
Update python/cuspatial/cuspatial/core/binpreds/binpreds.py
thomcom Dec 14, 2022
a2acb0c
Update python/cuspatial/cuspatial/core/binpreds/binpreds.py
thomcom Dec 14, 2022
f749ca0
Replace operation with predicate
thomcom Dec 14, 2022
e94293a
Replace operation with predicate.
thomcom Dec 14, 2022
3e7dacd
Rewrite docs.
thomcom Dec 14, 2022
86d30b6
Merge branch 'feature/all_pip_operations' into feature/all_equals_ope…
thomcom Dec 14, 2022
2c2de39
Working on using instead of equals.
thomcom Dec 15, 2022
5448dab
Fix everything except equals tests.
thomcom Dec 16, 2022
5b39f70
Iterative refactor that actually compares some points and linestrings…
thomcom Dec 16, 2022
c7f1d21
More specific test cases.
thomcom Dec 16, 2022
56865de
Refactor out _compare_aligned_vertices
thomcom Dec 16, 2022
57a2d46
Add multipoint
thomcom Dec 16, 2022
b5002e2
Add misordered multipoint test and success.
thomcom Dec 16, 2022
3e1d4d5
Implement polygons cases which do not yet pass.
thomcom Dec 16, 2022
fdc2fe6
Committed a crime of refactoring in order to get polygons non reorder…
thomcom Dec 16, 2022
32fb2a4
Properly support equals operations.
thomcom Jan 31, 2023
6b0c4c7
Merge branch 'branch-23.04' into feature/all_equals_operations
thomcom Feb 1, 2023
5996035
Merge branch 'branch-23.04' into feature/all_equals_operations
thomcom Feb 14, 2023
049b15f
Resolve merge of geoseries.py
thomcom Feb 14, 2023
4356f24
Put GeoColumnAccessors back inside of GeoSeries.
thomcom Feb 14, 2023
9dffa70
Clean up test_pip_only_binpreds.py
thomcom Feb 14, 2023
654c28e
Discovered a bug with larger sets of points.
thomcom Feb 14, 2023
212881e
Cleanup and drop 31 points
thomcom Feb 15, 2023
61a4108
Rename binops to binpreds.
thomcom Mar 8, 2023
5416e90
Merge branch 'branch-23.04' into feature/all_equals_operations
thomcom Mar 8, 2023
96666a5
Update copyright dates.
thomcom Mar 8, 2023
21f694c
Getting started with _ sort_polygons. Needs to have the closed edge r…
thomcom Mar 9, 2023
7854069
Dropping first point.
thomcom Mar 15, 2023
10c0498
Merge branch 'branch-23.04' into feature/all_equals_operations
thomcom Mar 17, 2023
5278b5e
Convert polygons to multipoints without their extra endpoint for sort…
thomcom Mar 17, 2023
dbcb617
Refactor out messy _sort_linestrings.
thomcom Mar 17, 2023
4a5ff80
cleaning up
thomcom Mar 17, 2023
ce3aa49
Remove polygon related equals operations.
thomcom Mar 17, 2023
d88c33b
Handle a review comment about wording.
thomcom Mar 17, 2023
3a7e04f
Trying to resolve docs formatting error.
thomcom Mar 17, 2023
b7d4d17
I think I've fixed the docs ci issue.
thomcom Mar 19, 2023
01789d7
Skip NotImplemented tests.
thomcom Mar 20, 2023
b53fb6c
Refactor equals binpred to handle equality of linestrings, which requ…
thomcom Mar 21, 2023
a9e6563
Rename dtypes.
thomcom Mar 21, 2023
41ead90
Extra overlaps and crosses tests
thomcom Mar 21, 2023
c7e3686
Merge branch 'branch-23.04' into feature/all_equals_operations
thomcom Mar 22, 2023
2b857db
Handle review comments.
thomcom Mar 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
309 changes: 301 additions & 8 deletions python/cuspatial/cuspatial/core/binpreds/binpreds.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,40 @@

import cudf

import cuspatial
from cuspatial.core._column.geocolumn import GeoColumn
from cuspatial.core.binpreds.contains import contains_properly
from cuspatial.utils.column_utils import (
contains_only_linestrings,
contains_only_multipoints,
contains_only_points,
contains_only_polygons,
has_multipolygons,
has_same_geometry,
)


class PreprocessorOutput:
thomcom marked this conversation as resolved.
Show resolved Hide resolved
"""The output of the preprocess method of a binary predicate.

This makes it possible to create a class that matches the necessary
signature of a geoseries.GeoColumnAccessor object. In some cases the
preprocessor may need to reorder the input data, in which case the
preprocessor will return a PreprocessorOutput object instead of a
GeoColumnAccessor."""

def __init__(self, coords, indices) -> None:
self.vertices = coords
self.indices = indices

@property
def xy(self):
return self.vertices

def point_indices(self):
return self.indices


class BinaryPredicate(ABC):
"""BinaryPredicate is an abstract class that implements the binary
predicate algorithm. The binary predicate algorithm is used to compute
Expand Down Expand Up @@ -58,10 +81,12 @@ def preprocess(self, lhs, rhs):
The right-hand-side of the internal binary predicate, may be
reordered.
"""
pass
return (lhs, rhs, cudf.RangeIndex(len(rhs)))

@abstractmethod
def postprocess(self, point_indices, point_result):
def postprocess(
self, point_indices: cudf.Series, point_result: cudf.Series
) -> cudf.Series:
"""Postprocess the output data for the binary predicate. This method
should be implemented by subclasses.

Expand Down Expand Up @@ -134,6 +159,15 @@ def __init__(self, lhs, rhs, align=True):
(self.lhs, self.rhs) = lhs.align(rhs) if align else (lhs, rhs)
self.align = align

def _cancel_op(self, lhs, rhs, result):
"""Used to disable computation of the binary predicate.

This occurs when the binary predicate is not supported for the
input types, and a final result can be computed only using
`preprocess` and `postprocess`."""
self._op = lambda x, y: result
return (lhs, rhs, result)

def __call__(self) -> cudf.Series:
"""Return the result of the binary predicate."""
# Type disambiguation
Expand Down Expand Up @@ -178,9 +212,9 @@ def preprocess(self, lhs, rhs):
# Arrange into shape for calling point-in-polygon, intersection, or
# equals
point_indices = geom.point_indices()
from cuspatial.core.geoseries import GeoSeries

final_rhs = GeoSeries(GeoColumn._from_points_xy(xy_points._column))
final_rhs = cuspatial.core.geoseries.GeoSeries(
GeoColumn._from_points_xy(xy_points._column)
)
return (lhs, final_rhs, cudf.Series(point_indices))

def _should_use_quadtree(self):
Expand Down Expand Up @@ -391,14 +425,16 @@ def _postprocess_brute_force_result(self, point_indices, point_result):

# Result can be:
# A Dataframe of booleans with n_points rows and up to 31 columns.
# Discrete math recombination
if (
contains_only_linestrings(self.rhs)
or contains_only_polygons(self.rhs)
or contains_only_multipoints(self.rhs)
):
# process for completed linestrings, polygons, and multipoints.
# Not necessary for points.
# Compute the set of results for each point-in-polygon predicate.
# Group them by the original index, and sum the results. If the
# sum of points in the rhs feature is equal to the number of
# points found in the polygon, then the polygon contains the
# feature.
point_result["idx"] = point_indices
group_result = (
point_result.groupby("idx").sum().sort_index()
Expand Down Expand Up @@ -487,3 +523,260 @@ def preprocess(self, lhs, rhs):
if contains_only_polygons(rhs):
(self.lhs, self.rhs) = (rhs, lhs)
return super().preprocess(rhs, lhs)

def postprocess(self, point_indices, point_result):
"""Postprocess the output GeoSeries to ensure that they are of the
correct type for the predicate."""
(hits, expected_count,) = self._count_results_in_multipoint_geometries(
point_indices, point_result
)
result_df = hits.reset_index().merge(
expected_count.reset_index(), on="rhs_index"
)
result_df["feature_in_polygon"] = (
result_df["point_index_x"] >= result_df["point_index_y"]
)
final_result = cudf.Series(
[False] * (point_indices.max().item() + 1)
) # point_indices is zero index
final_result.loc[
result_df["rhs_index"][result_df["feature_in_polygon"]]
] = True
return final_result


class EqualsBinpred(BinaryPredicate):
def _offset_equals(self, lhs, rhs):
"""Compute the pairwise length equality of two offset arrays"""
lhs_lengths = lhs[:-1] - lhs[1:]
rhs_lengths = rhs[:-1] - rhs[1:]
return lhs_lengths == rhs_lengths

def _sort_interleaved_points_by_offset(self, coords, offsets, sort_order):
"""Sort xy according to bins defined by offset. Sort order is a list
of column names to sort by.

`_sort_interleaved_points_by_offset` creates a dataframe with the
following columns:
"sizes": an index for each object represented in `coords`.
"points": an index for each point in `coords`.
"xy_key": an index that maintains x/y ordering.
"xy": the x/y coordinates in `coords`.

The dataframe is sorted according to keys passed in by the caller.
For sorting multipoints, the keys in order are "object_key", "xy",
"xy_key". This sorts the points in each multipoint into the same
bin defined by "object_key", then sorts the points in each bin by
x/y coordinates, and finally sorts the points in each bin by the
`xy_key` which maintains that the x coordinate precedes the y
coordinate.

For sorting linestrings, the keys in order are "object_key",
"point_key", "xy_key". This sorts the points in each linestring
into the same bin defined by "object_key", then sorts the points
in each bin by point ordering, and finally sorts the points in
each bin by x/y ordering.

Parameters
----------
coords : cudf.Series
interleaved x,y coordinates
offsets : cudf.Series
offsets into coords
sort_order : list
list of column names to sort by. One of "object_key", "point_key",
"xy_key", and "xy".

Returns
-------
cudf.Series
sorted interleaved x,y coordinates
"""
sizes = offsets[1:] - offsets[:-1]
object_key = (
cudf.Series(cp.arange(len(sizes)))
.repeat(sizes * 2)
.reset_index(drop=True)
)
point_key = cp.arange(len(coords) // 2).repeat(2)[::-1]
xy_key = cp.tile([0, 1], len(coords) // 2)
sorting_df = cudf.DataFrame(
{
"object_key": object_key,
"point_key": point_key,
"xy_key": xy_key,
"xy": coords,
}
)
sorted_df = sorting_df.sort_values(by=sort_order).reset_index(
drop=True
)
return sorted_df["xy"]

def _sort_multipoint_series(self, coords, offsets):
"""Sort xy according to bins defined by offset"""
result = self._sort_interleaved_points_by_offset(
coords, offsets, ["object_key", "xy", "xy_key"]
)
result.name = None
return result

def _sort_multipoints(self, lhs, rhs, initial):
lhs_sorted = self._sort_multipoint_series(
lhs.multipoints.xy, lhs.multipoints.geometry_offset
)
rhs_sorted = self._sort_multipoint_series(
rhs.multipoints.xy, rhs.multipoints.geometry_offset
)
lhs_result = cuspatial.core.geoseries.GeoSeries.from_multipoints_xy(
lhs_sorted, lhs.multipoints.geometry_offset
)
rhs_result = cuspatial.core.geoseries.GeoSeries.from_multipoints_xy(
rhs_sorted, rhs.multipoints.geometry_offset
)
lhs_result.index = lhs.index
rhs_result.index = rhs.index
return (
lhs_result,
rhs_result,
initial,
)

def _reverse_linestrings(self, coords, offsets):
"""Reverse the order of coordinates in a Arrow buffer of coordinates
and offsets."""
result = self._sort_interleaved_points_by_offset(
coords, offsets, ["object_key", "point_key", "xy_key"]
)
result.name = None
return result

def _compare_linestrings_and_reversed(self, lhs, rhs, initial):
"""Compare linestrings with their reversed counterparts."""
lhs_xy = self._reverse_linestrings(lhs.xy, lhs.part_offset)
rhs_xy = self._reverse_linestrings(rhs.xy, rhs.part_offset)
return (
PreprocessorOutput(lhs_xy, lhs.point_indices()),
PreprocessorOutput(rhs_xy, rhs.point_indices()),
initial,
)

def preprocess(self, lhs, rhs):
# Compare types
type_compare = lhs.feature_types == rhs.feature_types
# Any unmatched type is not equal
if (type_compare == False).all(): # noqa: E712
isVoid marked this conversation as resolved.
Show resolved Hide resolved
# Override _op so that it will not be run.
return self._cancel_op(lhs, rhs, type_compare)
# Get indices of matching types
if contains_only_multipoints(lhs):
lengths_equal = self._offset_equals(
lhs.multipoints.geometry_offset,
rhs.multipoints.geometry_offset,
)
if lengths_equal.any():
# Multipoints are equal if they contains the
# same unordered points.
return self._sort_multipoints(
lhs[lengths_equal],
rhs[lengths_equal],
lengths_equal,
)
else:
# No lengths are equal, so none can be equal.
return self._cancel_op(lhs, rhs, lengths_equal)
elif contains_only_linestrings(lhs):
lengths_equal = self._offset_equals(
lhs.lines.part_offset, rhs.lines.part_offset
)
if lengths_equal.any():
return (
lhs[lengths_equal],
rhs[lengths_equal],
lengths_equal,
)
else:
return self._cancel_op(lhs, rhs, lengths_equal)
elif contains_only_polygons(lhs):
raise NotImplementedError
elif contains_only_points(lhs):
return (lhs, rhs, type_compare)

def postprocess(self, lengths_equal, point_result):
# if point_result is not a Series, preprocessing terminated
# the results early.
if isinstance(point_result, cudf.Series):
point_result = point_result.sort_index()
lengths_equal[point_result.index] = point_result
return cudf.Series(lengths_equal)

def _vertices_equals(self, lhs, rhs):
"""Compute the equals relationship between interleaved xy
coordinate buffers."""
length = min(len(lhs), len(rhs))
a = lhs[:length:2]._column == rhs[:length:2]._column
b = rhs[1:length:2]._column == lhs[1:length:2]._column
return a & b

def _op(self, lhs, rhs):
if contains_only_linestrings(lhs):
# Linestrings can be compared either forward or
# reversed. We need to compare both directions.
lhs_reversed = self._reverse_linestrings(
lhs.lines.xy, lhs.lines.part_offset
)
forward_result = self._vertices_equals(lhs.lines.xy, rhs.lines.xy)
reverse_result = self._vertices_equals(lhs_reversed, rhs.lines.xy)
result = forward_result | reverse_result
elif contains_only_multipoints(lhs):
result = self._vertices_equals(
lhs.multipoints.xy, rhs.multipoints.xy
)
elif contains_only_points(lhs):
result = self._vertices_equals(lhs.points.xy, rhs.points.xy)
elif contains_only_polygons(lhs):
raise NotImplementedError
indices = lhs.point_indices
result_df = cudf.DataFrame(
{"idx": indices[: len(result)], "equals": result}
)
gb_idx = result_df.groupby("idx")
result = (gb_idx.sum().sort_index() == gb_idx.count().sort_index())[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this doing?

Copy link
Contributor Author

@thomcom thomcom Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$a_i == b_i \leftrightarrow \forall j$ in len(a) $a_j == b_j$

"equals"
]
result.index = lhs.index
result.index.name = None
result.name = None
return result


class CrossesBinpred(EqualsBinpred):
thomcom marked this conversation as resolved.
Show resolved Hide resolved
"""An object is said to cross other if its interior intersects the
interior of the other but does not contain it, and the dimension of
the intersection is less than the dimension of the one or the other.

This is only implemented for `points.crosses(points)` at this time.
"""

def postprocess(self, point_indices, point_result):
if has_same_geometry(self.lhs, self.rhs) and contains_only_points(
self.lhs
):
return cudf.Series([False] * len(self.lhs))
df_result = cudf.DataFrame({"idx": point_indices, "pip": point_result})
point_result = cudf.Series(
df_result["pip"], index=cudf.RangeIndex(0, len(df_result))
)
point_result.name = None
return point_result


class CoversBinpred(EqualsBinpred):
thomcom marked this conversation as resolved.
Show resolved Hide resolved
"""An object A covers another B if no points of B lie in the exterior
of A, and at least one point of the interior of B lies in the interior
of A.

This is only implemented for `points.covers(points)` at this time."""

def postprocess(self, point_indices, point_result):
return cudf.Series(point_result, index=point_indices)
Loading