Support categorical splits in in TreeExplainer #4473

hcho3 · 2022-01-08T00:22:38Z

Stacked on top of #4447. Do not merge until #4447 is merged first.

~~If you'd like to review before #4447 is merged, look at the net diff at hcho3#2.~~

Test XGBoost models with categorical splits
Test LightGBM models with categorical splits

…pport

RAMitchell

Looking good, bar a few minor comments.

This is quite a complicated feature, so I wouldn't be surprised if there are still bugs lurking even with review and good test coverage. We could try to do some stress testing with hypothesis to really increase our confidence, but I'll leave it up to you.

ci/gpu/build.sh

cpp/include/cuml/explainer/tree_shap.hpp

python/cuml/experimental/explainer/tree_shap.pyx

RAMitchell · 2022-01-21T10:27:54Z

python/cuml/experimental/explainer/tree_shap.pyx

        # XGBoost model object
-        if cls.__module__ == 'xgboost.core' and cls.__name__ == 'Booster':
+        if re.match(r'xgboost.*$', cls_module) and cls_name == 'Booster':


Would be nice to see this logic for matching different libraries move into TreeLite. e.g. treelite.Model(model) - treelite guesses the model type.

Historically, this type of code for guessing module names ends up being fragile over time. Not sure how to avoid it though.

this type of code for guessing module names ends up being fragile over time. Not sure how to avoid it though.

The alternative is to ask the user to pass a string argument to indicate which model type it is. This will be more typing however.

cpp/src/explainer/tree_shap.cu

hcho3 · 2022-01-21T19:01:51Z

We could try to do some stress testing with hypothesis to really increase our confidence

I don't think we've decided yet to require hypothesis as a test dependency. See #3085. For now, I'll add more tests to cover edge cases.

RAMitchell · 2022-01-23T18:17:44Z

Hypothesis is already in the rapids environment, although this is maybe not the same as us being allowed to use it :)

I wrote almost all my tests for #4492 in hypothesis.

dantegd · 2022-01-24T15:01:45Z

@hcho3 Rory's advice is correct, hypothesis already being used by cuDF, so it is already in our testing environments (and developer envs for that matter), so it should be fine to use in pytests

hcho3 · 2022-01-24T19:59:11Z

@RAMitchell I agree that property-based testing is a good idea. Given that code freeze is in 3 days, let me add hypothesis based testing in a follow-up PR targeting 22.04.

RAMitchell · 2022-01-24T20:52:04Z

Good idea, it's not urgent.

…pport

ajschmidt8 · 2022-01-26T19:03:42Z

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

…_support

hcho3 · 2022-01-26T19:37:35Z

@RAMitchell @wphicks Can you take another quick look? I suggest that we merge this as long as there isn't any big blocker. We can address remaining issues in the following release, in which we'd move TreeExplainer out of experimental namespace.

hcho3 · 2022-01-27T06:13:16Z

Rerun tests

dantegd · 2022-01-27T18:59:15Z

rerun tests

codecov-commenter · 2022-01-27T22:04:01Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.02@3ccf77f). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.02    #4473   +/-   ##
===============================================
  Coverage                ?   85.69%           
===============================================
  Files                   ?      236           
  Lines                   ?    19338           
  Branches                ?        0           
===============================================
  Hits                    ?    16572           
  Misses                  ?     2766           
  Partials                ?        0

Flag	Coverage Δ
dask	`46.52% <0.00%> (?)`
non-dask	`78.59% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ccf77f...f97106a. Read the comment docs.

dantegd · 2022-01-28T00:02:25Z

@RAMitchell will go ahead and merge this PR so that it makes it to release 22.02, @hcho3 could you address any pending comments and further testing in a follow up PR? Thanks!

dantegd · 2022-01-28T00:02:31Z

@gpucibot merge

Stacked on top of rapidsai#4447. Do not merge until rapidsai#4447 is merged first. ~If you'd like to review before rapidsai#4447 is merged, look at the net diff at hcho3#2.~ - [x] Test XGBoost models with categorical splits - [x] Test LightGBM models with categorical splits Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4473

hcho3 added 20 commits December 4, 2021 00:45

Break up path processing logic in TreeSHAP explainer

9eb9e99

Fix style

e715724

Support cuML RF classifier in TreeExplainer

94db1a1

Merge remote-tracking branch 'origin/branch-22.02' into classifier_su…

386b36b

…pport

Fix style

3cde92a

Remove print()

7b66105

Test multiple input types in test_cuml_rf_classifier

87ebc93

Test scikit-learn RF regressors and classifiers

21a2bb9

Make scikit-learn optional

ade0448

Fix style

e3667c4

Consolidate path extraction logic

87458a6

Use shap.explainers.Tree

d0dcefd

Fix style

729e98d

Use weighted sample count in sklearn models

80e45a5

Add missing skipif mark

69d6461

Extract traverse_towards_leaf_node()

cc54ae1

Eliminate the use of reference parameter

4da8485

Fix style

b37d638

Fix style

8092a1d

Update copyright years

5fbc641

hcho3 requested review from a team as code owners January 8, 2022 00:22

hcho3 marked this pull request as draft January 8, 2022 00:22

github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels Jan 8, 2022

hcho3 added 5 commits January 13, 2022 16:30

Merge remote-tracking branch 'origin/branch-22.02' into classifier_su…

743e2eb

…pport

Relax test tolerance

d2da04e

Temporarily use Treelite 2.2.0 for testing

8fc907f

Temporarily use Treelite 2.2.0 for testing

f15c7a5

Fix copyright years

589d18a

RAMitchell reviewed Jan 21, 2022

View reviewed changes

hcho3 added 3 commits January 25, 2022 20:38

Merge remote-tracking branch 'origin/branch-22.02' into classifier_su…

5dd56c8

…pport

Remove temporary install step in build.sh

cdb46d3

Merge branch 'classifier_support' into categorical_support

c5b230c

github-actions bot removed conda conda issue gpuCI gpuCI issue CMake labels Jan 25, 2022

hcho3 added 3 commits January 26, 2022 00:29

Respond to reviewer's comment

e234545

Make shap optional in tests

6341641

Add coverage for missing values

9f6f0c9

ajschmidt8 removed the request for review from a team January 26, 2022 19:03

hcho3 added 2 commits January 26, 2022 11:24

Merge remote-tracking branch 'upstream/branch-22.02' into categorical…

f87e180

…_support

Fix typo in comment

48a6a83

Fix style

f97106a

dantegd approved these changes Jan 28, 2022

View reviewed changes

rapids-bot bot merged commit 63e738c into rapidsai:branch-22.02 Jan 28, 2022

hcho3 deleted the categorical_support branch January 28, 2022 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support categorical splits in in TreeExplainer #4473

Support categorical splits in in TreeExplainer #4473

hcho3 commented Jan 8, 2022 •

edited

Loading

RAMitchell left a comment

RAMitchell Jan 21, 2022

hcho3 Jan 21, 2022 •

edited

Loading

hcho3 commented Jan 21, 2022

RAMitchell commented Jan 23, 2022

dantegd commented Jan 24, 2022

hcho3 commented Jan 24, 2022

RAMitchell commented Jan 24, 2022

ajschmidt8 commented Jan 26, 2022

hcho3 commented Jan 26, 2022

hcho3 commented Jan 27, 2022

dantegd commented Jan 27, 2022

codecov-commenter commented Jan 27, 2022

dantegd commented Jan 28, 2022

dantegd commented Jan 28, 2022

Support categorical splits in in TreeExplainer #4473

Support categorical splits in in TreeExplainer #4473

Conversation

hcho3 commented Jan 8, 2022 • edited Loading

RAMitchell left a comment

Choose a reason for hiding this comment

RAMitchell Jan 21, 2022

Choose a reason for hiding this comment

hcho3 Jan 21, 2022 • edited Loading

Choose a reason for hiding this comment

hcho3 commented Jan 21, 2022

RAMitchell commented Jan 23, 2022

dantegd commented Jan 24, 2022

hcho3 commented Jan 24, 2022

RAMitchell commented Jan 24, 2022

ajschmidt8 commented Jan 26, 2022

hcho3 commented Jan 26, 2022

hcho3 commented Jan 27, 2022

dantegd commented Jan 27, 2022

codecov-commenter commented Jan 27, 2022

Codecov Report

dantegd commented Jan 28, 2022

dantegd commented Jan 28, 2022

hcho3 commented Jan 8, 2022 •

edited

Loading

hcho3 Jan 21, 2022 •

edited

Loading