Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary Predicate Test Dispatching #1085

Merged
merged 9 commits into from
Apr 26, 2023

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented Apr 19, 2023

Closes #1046
Closes #1036

This PR adds a new binary predicate test dispatch test system. The test dispatcher creates one test for each ordered pair of features from the set (point, linestring, polygon) and each predicate that feature tuple can be applied to:

  • contains
  • covers
  • crosses
  • disjoint
  • geom_equals
  • intersects
  • overlaps
  • touches
  • within

The combination of 9 predicates and 33 tests creates 297 tests that cover all possible combinations of simple features and their predicates.

While development is underway, the test dispatcher automatically xfails any test that fails or hasn't been implemented yet in order to pass CI.

The test dispatcher outputs diagnostic results during each test run. An output file test_binpred_test_dispatch.log is created containing all of the failing tests, including their name, a visual description of the feature tuple and relationship, the shapely objects used to create the test, and the runtime name of the test so it is easy for a developer (myself) to identify which test failed and rerun it. It also creates four .csv files during runtime that collect the results of each test pass or fail relative to which predicate is being run and which pair of features are being tested. These .csv files can be displayed using tests/binpred/summarize_binpred_test_dispatch_results.py, which will output a dataframe of each CSV file thusly:

(rapids) coder ➜ ~/cuspatial/python/cuspatial/cuspatial $ python tests/binpreds/summarize_binpred_test_dispatch_results.py 
     predicate  predicate_passes
0  geom_equals                20
1   intersects                11
2       covers                17
3      crosses                 6
4     disjoint                 9
5     overlaps                20
6       within                14
     predicate  predicate_fails
0     contains               33
1  geom_equals               13
2   intersects               22
3       covers               16
4      crosses               27
5     disjoint               24
6     overlaps               13
7      touches               33
8       within               19
                                             feature  feature_passes
0     (<ColumnType.POINT: 1>, <ColumnType.POINT: 1>)              14
1   (<ColumnType.POINT: 1>, <ColumnType.POLYGON: 4>)              22
2  (<ColumnType.LINESTRING: 3>, <ColumnType.LINES...              16
3  (<ColumnType.LINESTRING: 3>, <ColumnType.POLYG...              38
4  (<ColumnType.POINT: 1>, <ColumnType.LINESTRING...               7
                                             feature  feature_fails
0     (<ColumnType.POINT: 1>, <ColumnType.POINT: 1>)              4
1  (<ColumnType.POINT: 1>, <ColumnType.LINESTRING...             20
2   (<ColumnType.POINT: 1>, <ColumnType.POLYGON: 4>)             14
3  (<ColumnType.LINESTRING: 3>, <ColumnType.LINES...             20
4  (<ColumnType.LINESTRING: 3>, <ColumnType.POLYG...             52
5  (<ColumnType.POLYGON: 4>, <ColumnType.POLYGON:...             90

Without additional modifications, the test dispatcher produces 97 passing results and 200 xfailing results. In a shortly upcoming PR, updates to the basic predicates and binpred test architecture increase the number of passing results to 274, with 25 xfails. These PRs are separated for ease of review.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@thomcom thomcom added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 19, 2023
@thomcom thomcom requested a review from a team as a code owner April 19, 2023 19:01
@thomcom thomcom self-assigned this Apr 19, 2023
@thomcom thomcom requested review from harrism and isVoid April 19, 2023 19:01
@github-actions github-actions bot added the Python Related to Python code label Apr 19, 2023
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One initial question about pytest machinery.

(Polygon, Polygon): object_dispatch(polygon_polygon_dispatch_list),
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you rely on the machinery of @pytest.mark.parameterize() to build up the Cartesian product of test combinations rather than using nested loops?

types = [Point, Linestring, Polygon]
predicates = [disjoint, intersects, overlaps, contains, constains_properly, within, etc.]

@pytest.mark.parametrize('LHS type', types)
@pytest.mark.parametrize('RHS type', types)
@pytest.mark.parametrize('predicate', predicates)
@pytest.mark.parametrize('the data', the_data)
def test(...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started out just parameterizing the cases in the fashion that you describe. Because each of the feature combinations is predefined, I needed to have a separate test with separate parameterizations for exclusive set of features (point, linestring), (point, polygon), etc. Because I wanted logging and reporting on the test results, it was much easier to move that into a single test, and a single list of features that are used in the test. The primary contribution of this PR is the hand-coded list in features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember the specific reason that I parameterize the tests according to the pattern in https://github.com/rapidsai/cuspatial/pull/1085/files#diff-7064c17cb647ac9ab6d1e6f5c4c1726f68fd5b20205dd85f1d34684d53fd3ea6R20-R35 instead of using the pattern that you described. The simple answer is because you cannot parameterize a pytest fixture, only native iterables. I am using the @pytest.mark.parameterize functionality, I've just moved the generation of the tests into the fixture instead of passing a list to the @pytest.mark.parameterize call.

@harrism
Copy link
Member

harrism commented Apr 20, 2023

Closes #1036

That issue is about documenting all special cases. How does this PR close that?

…spatch.py

Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
@thomcom
Copy link
Contributor Author

thomcom commented Apr 20, 2023

#1036 is "Document all simple and single geometric special cases #1036" which specifies two follow up issues, one for multiple geometries and polygons with rings, and another for true "special cases" as we discover them. This PR closes #1036 by including 34 specific tests that cover all geometric configurations of the feature combinations for (point, linestring, polygon), which is presented and executed from the features fixture in binpred_test_dispatch.py.

@harrism
Copy link
Member

harrism commented Apr 26, 2023

So you are replacing "document" with "test"?

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool!

out_file = open("test_binpred_test_dispatch.log", "w")


@xfail_on_exception # TODO: Remove when all tests are passing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests are passing now ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are but not in this PR, the full submission comes in 3 PRs: #1085, #1086, and #1064

expected = gpd_pred_fn(gpdrhs)
assert (got.values_host == expected.values).all()

# The test is complete, the rest is just logging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the logging to file part, which can prove to be useful when bulk debugging a new feature of hundreds of failures. But when this system is online in CI, it would also be helpful to have a "CI mode" that throws the result to stdout. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there will be more binary predicate PRs to cover complex (multi-) geometry types, how would you feel about filing an issue for this feature? I'm certainly fine with implementing it but I'd rather not make it a dependency of #1064 that holds it up and needs to be merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's totally fine with me

""",
point_polygon,
point_polygon,
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harrism the "Documentation" is primarily here. I can move it to the binary predicates design document if you like, as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion. Not blocking to merge.

@thomcom
Copy link
Contributor Author

thomcom commented Apr 26, 2023

/merge

@rapids-bot rapids-bot bot merged commit 959cfdb into rapidsai:branch-23.06 Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to Python code
Projects
Status: Review
3 participants