Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement and Test All non-multi-Feature Spatial Predicate Combinations #1064

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented Apr 10, 2023

Depends on #1085
Depends on #1086
Depends on #1022
Closes #1062
Closes #1046

Description

Tests and passes all simple feature combinations across nine binary predicates.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

thomcom and others added 30 commits March 30, 2023 12:55
…equals_count.cuh

Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
…t_test.cu

Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
…com/cuspatial into feature/allpairs_point_equals_count
Comment on lines +48 to +50
return ~(contains_lhs | contains_rhs) & (
contains_properly_lhs | contains_properly_rhs
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit conflicting to me. Negating contains and then contains_properly?

Copy link
Contributor Author

@thomcom thomcom May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For polygons a and b to overlap, it is necessary that neither polygon completely contain the other polygon, but that some of the points in a or b be contained by the other. https://postgis.net/docs/ST_Overlaps.html. Let me know if this answer is sufficient.

Copy link
Contributor

@isVoid isVoid May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, the only difference between contains and contains_properly, are that if one point falls on the boundary of another, it does/does not counted as contains.

The first part of the statement:

neither polygon completely contain the other polygon

is correctly captured by ~(contains_lhs | contains_rhs).

However, this statement isn't:

that some of the points in a or b be contained by the other

contains_properly still tests if "all points of A is contained by B", what you need is a contains_any that test if any of A is contained by B.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is resolved as PolygonPolygonOverlaps now uses _basic_contains_properly_any, which returns true if any of B is contained by A, and the reverse.

python/cuspatial/cuspatial/core/binpreds/feature_covers.py Outdated Show resolved Hide resolved
@thomcom
Copy link
Contributor Author

thomcom commented May 18, 2023

Thanks for your detailed and perceptive comments and additional test cases @isVoid I've handled your review.

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, well done!

Comment on lines 103 to 130
try:
# The test passed, store the results.
predicate_passes[predicate] = (
1
if predicate not in predicate_passes
else predicate_passes[predicate] + 1
)
feature_passes[(lhs.column_type, rhs.column_type)] = (
1
if (lhs.column_type, rhs.column_type) not in feature_passes
else feature_passes[(lhs.column_type, rhs.column_type)] + 1
)
passes_df = pd.DataFrame(
{
"predicate": list(predicate_passes.keys()),
"predicate_passes": list(predicate_passes.values()),
}
)
passes_df.to_csv("predicate_passes.csv", index=False)
passes_df = pd.DataFrame(
{
"feature": list(feature_passes.keys()),
"feature_passes": list(feature_passes.values()),
}
)
passes_df.to_csv("feature_passes.csv", index=False)
except Exception as e:
raise ValueError(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, you don't need an additional layer of exception handler that rethrows all exceptions as ValueError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - this handler just prevents potential issues with the logging appearing like test errors, so I've modified it to just raise e

isVoid

This comment was marked as duplicate.

isVoid

This comment was marked as duplicate.

return intersects & ~equals


class LineStringLineStringContainsPredicate(BinPred):
def _preprocess(self, lhs, rhs):
count = _basic_equals_count(lhs, rhs)
return count == rhs.sizes
# A linestring A covers another linestring B iff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment correct? This is contains, not covers. The code for the two looks the same. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in the context of LineString+LineString, .contains and .covers are equivalent. This comment persisting in feature_contains.py is not correct, thanks!

https://postgis.net/docs/ST_Covers.html
https://postgis.net/docs/ST_Contains.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import geopandas
from cuspatial.tests.binpreds.binpred_test_dispatch import (
    features,
    linestring_linestring_dispatch_list,
)
def sample_test_data(features, dispatch_list, size, lib=cuspatial):
    geometry_tuples = [features[key][1:3] for key in dispatch_list]
    geometries = [
        [lhs_geo for lhs_geo, _ in geometry_tuples],
        [rhs_geo for _, rhs_geo in geometry_tuples]
    ]
    lhs = lib.GeoSeries(list(geometries[0]))
    rhs = lib.GeoSeries(list(geometries[1]))
    if lib == geopandas:
        rng = np
    else:
        rng = cp
    rng.random.seed()
    lhs_picks = rng.random.randint(0, len(lhs), size)
    rhs_picks = rng.random.randint(0, len(rhs), size)
    return (
        lhs[lhs_picks].reset_index(drop=True),
        rhs[rhs_picks].reset_index(drop=True)
    )
lhs, rhs = sample_test_data(features, linestring_linestring_dispatch_list, 1000, geopandas)
(lhs.contains(rhs) == lhs.covers(rhs)).all()
True

@@ -36,21 +47,65 @@ class CoversPredicateBase(EqualsPredicateBase):
pass


class LineStringLineStringCovers(BinPred):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on non-changed lines, but please don't add documentation that will go stale quickly, e.g. "in this initial release". Also, the fact that it is implemented using only the equals predicate is an implementation detail, not a feature of the API. Public documentation should cover features and usage of the API, not implementation details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, those docs and others definitely needed a cleanup.

@github-actions github-actions bot removed the libcuspatial Relates to the cuSpatial C++ library label May 25, 2023
@thomcom
Copy link
Contributor Author

thomcom commented May 26, 2023

/merge

@rapids-bot rapids-bot bot merged commit 3300768 into rapidsai:branch-23.06 May 26, 2023
@harrism
Copy link
Member

harrism commented May 29, 2023

BTW, I think non-multi is a better name than "simple". A simple feature means something specific, right? OSGeo IsSimple means no self intersections or self tangents, etc.

@harrism harrism changed the title Implement and Test All Simple Feature Combinations Implement and Test All non-multi-Feature Spatial Predicate Combinations May 29, 2023
@harrism
Copy link
Member

harrism commented May 29, 2023

Updated title so hopefully it will make it into our changelog with a very clear description of the feature. Please remember this for all PRs.

rapids-bot bot pushed a commit that referenced this pull request Aug 4, 2023
Closes #1138
Closes #1141 [here](https://github.com/rapidsai/cuspatial/pull/1156/files#diff-c522c9afb3364b1aed2b2589c0d0c260dbc634bc54844536b1d382cecb92bf29R112)
Depends on #1152
Depends on #1064

Please direct your attention [to the notebook](https://github.com/rapidsai/cuspatial/pull/1156/files#diff-cc4c516f63efa822793d75315c1b28a04bad6c9efc6fd2bdcac5cc30b05d14dd) since the dependencies and delayed state of CI issues over this week have put a lot of files into this PR.

This notebook demonstrates cuSpatial's new binary predicates on large datasets, benchmarking and comparing against the host implementation on GeoPandas.

In order to support the large inputs for these comparisons I had to reactivate the `pairwise_point_in_polygon` functionality that I'd previously written off. This is because quadtree doesn't support large N for NxN operations, since it is many-to-many, and brute-force would require a huge number of iterations to support such large dataframes. There are some more optimizations that can be made to speed up `pairwise_point_in_polygon`, but the algorithm itself isn't sufficiently fast. It is detailed fairly closely in the notebook.

Please take a look and let's have some conversations about steps forward.

Authors:
  - H. Thomson Comer (https://github.com/thomcom)

Approvers:
  - Michael Wang (https://github.com/isVoid)
  - Mark Harris (https://github.com/harrism)
  - Ray Douglass (https://github.com/raydouglass)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1156
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmake Related to CMake code or build configuration improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to Python code
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

Test all non-multi feature combinations Dispatch binary predicate tests according to predicate and cases.
3 participants