-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement and Test All non-multi-Feature Spatial Predicate Combinations #1064
Implement and Test All non-multi-Feature Spatial Predicate Combinations #1064
Conversation
…equals_count.cuh Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
…t_test.cu Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
…com/cuspatial into feature/allpairs_point_equals_count
…manual-test-dispatch
return ~(contains_lhs | contains_rhs) & ( | ||
contains_properly_lhs | contains_properly_rhs | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit conflicting to me. Negating contains and then contains_properly
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For polygons a and b to overlap, it is necessary that neither polygon completely contain the other polygon, but that some of the points in a or b be contained by the other. https://postgis.net/docs/ST_Overlaps.html. Let me know if this answer is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding, the only difference between contains and contains_properly, are that if one point falls on the boundary of another, it does/does not counted as contains.
The first part of the statement:
neither polygon completely contain the other polygon
is correctly captured by ~(contains_lhs | contains_rhs)
.
However, this statement isn't:
that some of the points in a or b be contained by the other
contains_properly
still tests if "all points of A is contained by B", what you need is a contains_any
that test if any of A is contained by B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is resolved as PolygonPolygonOverlaps
now uses _basic_contains_properly_any
, which returns true if any of B is contained by A, and the reverse.
Thanks for your detailed and perceptive comments and additional test cases @isVoid I've handled your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, well done!
try: | ||
# The test passed, store the results. | ||
predicate_passes[predicate] = ( | ||
1 | ||
if predicate not in predicate_passes | ||
else predicate_passes[predicate] + 1 | ||
) | ||
feature_passes[(lhs.column_type, rhs.column_type)] = ( | ||
1 | ||
if (lhs.column_type, rhs.column_type) not in feature_passes | ||
else feature_passes[(lhs.column_type, rhs.column_type)] + 1 | ||
) | ||
passes_df = pd.DataFrame( | ||
{ | ||
"predicate": list(predicate_passes.keys()), | ||
"predicate_passes": list(predicate_passes.values()), | ||
} | ||
) | ||
passes_df.to_csv("predicate_passes.csv", index=False) | ||
passes_df = pd.DataFrame( | ||
{ | ||
"feature": list(feature_passes.keys()), | ||
"feature_passes": list(feature_passes.values()), | ||
} | ||
) | ||
passes_df.to_csv("feature_passes.csv", index=False) | ||
except Exception as e: | ||
raise ValueError(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, you don't need an additional layer of exception handler that rethrows all exceptions as ValueError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right - this handler just prevents potential issues with the logging appearing like test errors, so I've modified it to just raise e
python/cuspatial/cuspatial/tests/binpreds/test_binpred_test_dispatch.py
Outdated
Show resolved
Hide resolved
return intersects & ~equals | ||
|
||
|
||
class LineStringLineStringContainsPredicate(BinPred): | ||
def _preprocess(self, lhs, rhs): | ||
count = _basic_equals_count(lhs, rhs) | ||
return count == rhs.sizes | ||
# A linestring A covers another linestring B iff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment correct? This is contains, not covers. The code for the two looks the same. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in the context of LineString+LineString, .contains and .covers are equivalent. This comment persisting in feature_contains.py
is not correct, thanks!
https://postgis.net/docs/ST_Covers.html
https://postgis.net/docs/ST_Contains.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import geopandas
from cuspatial.tests.binpreds.binpred_test_dispatch import (
features,
linestring_linestring_dispatch_list,
)
def sample_test_data(features, dispatch_list, size, lib=cuspatial):
geometry_tuples = [features[key][1:3] for key in dispatch_list]
geometries = [
[lhs_geo for lhs_geo, _ in geometry_tuples],
[rhs_geo for _, rhs_geo in geometry_tuples]
]
lhs = lib.GeoSeries(list(geometries[0]))
rhs = lib.GeoSeries(list(geometries[1]))
if lib == geopandas:
rng = np
else:
rng = cp
rng.random.seed()
lhs_picks = rng.random.randint(0, len(lhs), size)
rhs_picks = rng.random.randint(0, len(rhs), size)
return (
lhs[lhs_picks].reset_index(drop=True),
rhs[rhs_picks].reset_index(drop=True)
)
lhs, rhs = sample_test_data(features, linestring_linestring_dispatch_list, 1000, geopandas)
(lhs.contains(rhs) == lhs.covers(rhs)).all()
True
@@ -36,21 +47,65 @@ class CoversPredicateBase(EqualsPredicateBase): | |||
pass | |||
|
|||
|
|||
class LineStringLineStringCovers(BinPred): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't comment on non-changed lines, but please don't add documentation that will go stale quickly, e.g. "in this initial release". Also, the fact that it is implemented using only the equals predicate is an implementation detail, not a feature of the API. Public documentation should cover features and usage of the API, not implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, those docs and others definitely needed a cleanup.
/merge |
BTW, I think non-multi is a better name than "simple". A simple feature means something specific, right? OSGeo IsSimple means no self intersections or self tangents, etc. |
Updated title so hopefully it will make it into our changelog with a very clear description of the feature. Please remember this for all PRs. |
Closes #1138 Closes #1141 [here](https://github.com/rapidsai/cuspatial/pull/1156/files#diff-c522c9afb3364b1aed2b2589c0d0c260dbc634bc54844536b1d382cecb92bf29R112) Depends on #1152 Depends on #1064 Please direct your attention [to the notebook](https://github.com/rapidsai/cuspatial/pull/1156/files#diff-cc4c516f63efa822793d75315c1b28a04bad6c9efc6fd2bdcac5cc30b05d14dd) since the dependencies and delayed state of CI issues over this week have put a lot of files into this PR. This notebook demonstrates cuSpatial's new binary predicates on large datasets, benchmarking and comparing against the host implementation on GeoPandas. In order to support the large inputs for these comparisons I had to reactivate the `pairwise_point_in_polygon` functionality that I'd previously written off. This is because quadtree doesn't support large N for NxN operations, since it is many-to-many, and brute-force would require a huge number of iterations to support such large dataframes. There are some more optimizations that can be made to speed up `pairwise_point_in_polygon`, but the algorithm itself isn't sufficiently fast. It is detailed fairly closely in the notebook. Please take a look and let's have some conversations about steps forward. Authors: - H. Thomson Comer (https://github.com/thomcom) Approvers: - Michael Wang (https://github.com/isVoid) - Mark Harris (https://github.com/harrism) - Ray Douglass (https://github.com/raydouglass) - AJ Schmidt (https://github.com/ajschmidt8) URL: #1156
Depends on #1085
Depends on #1086
Depends on #1022
Closes #1062
Closes #1046
Description
Tests and passes all simple feature combinations across nine binary predicates.
Checklist