[FEA] Add GPUTreeSHAP to cuML explainer module (experimental) #4351

hcho3 · 2021-11-10T23:16:27Z

Addresses #4110

This is an experimental prototype. For now, it supports:

XGBoost models with numerical splits
cuML RF regressors with numerical splits

cuML RF classifiers are not supported.

python/cuml/test/explainer/test_gpu_treeshap.py

python/cuml/explainer/tree_shap.pyx

python/cuml/test/explainer/test_gpu_treeshap.py

wphicks · 2021-11-16T19:09:04Z

Locally, I'm getting RF tests outside of the set tolerance:

________________________________________ test_cuml_rf_classifier ________________________________________

    def test_cuml_rf_classifier():
        X, y = fetch_california_housing(return_X_y=True)
        X, y = X.astype(np.float32), y.astype(np.float32)
        cuml_model = curfr(max_features=1.0, max_samples=0.1, n_bins=128,
                           min_samples_leaf=2, random_state=123,
                           n_streams=1, n_estimators=10, max_leaves=-1,
                           max_depth=16, accuracy_metric="mse")
        cuml_model.fit(X, y)
        pred = cuml_model.predict(X)
        tl_model = cuml_model.convert_to_treelite_model()
    
        explainer = TreeExplainer(model=tl_model)
        out = explainer.shap_values(X)
>       np.testing.assert_almost_equal(np.sum(out, axis=1), pred, decimal=3)
E       AssertionError: 
E       Arrays are not almost equal to 3 decimals
E       
E       Mismatched elements: 8457 / 20640 (41%)
E       Max absolute difference: 1.1803854
E       Max relative difference: 0.36403635
E        x: array([4.086, 3.846, 4.045, ..., 0.858, 0.856, 1.061], dtype=float32)
E        y: array([4.055, 3.946, 4.045, ..., 0.866, 0.856, 1.061], dtype=float32)

wphicks · 2021-11-16T19:10:52Z

Should we put this in the experimental namespace for 21.12?

cpp/include/cuml/explainer/tree_shap.hpp

cpp/src/explainer/tree_shap.cu

wphicks · 2021-11-16T20:25:21Z

cpp/src/explainer/tree_shap.cu

+
+template <typename ThresholdType, typename LeafType>
+ExtractedPathHandle
+extract_paths_impl(const tl::ModelImpl<ThresholdType, LeafType>& model) {


The logic of this function seems generally correct, but it's also a bit hard to follow. Could we break it up a bit or consider using algorithms directly on the underlying containers where feasible rather than trying to keep track of all the separate indexes?

cpp/src/explainer/tree_shap.cu

hcho3 · 2021-11-16T22:53:27Z

I fixed the failing unit test. Marking this PR as non-draft, since it's functionally complete. I'll now address stylistic concerns.

python/cuml/experimental/explainer/tree_shap.pyx

python/cuml/test/explainer/test_gpu_treeshap.py

python/cuml/experimental/explainer/tree_shap.pyx

python/cuml/test/explainer/test_gpu_treeshap.py

wphicks · 2021-11-17T15:29:41Z

cpp/src/explainer/tree_shap.cu

-  ExtractedPathImpl<ThresholdType>* paths
-    = dynamic_cast<ExtractedPathImpl<ThresholdType>*>(path_container.get());
+  std::unique_ptr<TreePathInfo> path_info_ptr = std::make_unique<TreePathInfoImpl<ThresholdType>>();
+  auto* path_info = dynamic_cast<TreePathInfoImpl<ThresholdType>*>(path_info_ptr.get());


I think it would be better to avoid the dynamic_cast here (and the general complexity of the inner implementation classes for different types) by using a single struct with a std::variant member variable, but let's just make note of that and save it for a follow-on.

cpp/src/explainer/tree_shap.cu

python/cuml/experimental/explainer/tree_shap.pyx

codecov-commenter · 2021-11-18T07:33:10Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.12@34f7929). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.12    #4351   +/-   ##
===============================================
  Coverage                ?   85.65%           
===============================================
  Files                   ?      236           
  Lines                   ?    19179           
  Branches                ?        0           
===============================================
  Hits                    ?    16427           
  Misses                  ?     2752           
  Partials                ?        0

Flag	Coverage Δ
dask	`46.17% <0.00%> (?)`
non-dask	`78.49% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34f7929...56d62d3. Read the comment docs.

wphicks · 2021-11-18T15:34:56Z

@gpucibot merge

Addresses rapidsai#4110 This is an experimental prototype. For now, it supports: * XGBoost models with numerical splits * cuML RF regressors with numerical splits cuML RF classifiers are not supported. Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Rory Mitchell (https://github.com/RAMitchell) - William Hicks (https://github.com/wphicks) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4351

[FEA] Add GPUTreeSHAP to cuML explainer module

5f01668

github-actions bot added CMake CUDA/C++ Cython / Python Cython or Python issue labels Nov 10, 2021

hcho3 added 4 commits November 11, 2021 00:02

Merge remote-tracking branch 'origin/branch-21.12' into gputreeshap

d90ff5e

Add support for cuML RF

f779e89

Add support for (some) cuml RF

e210be2

Add a test for cuML RF

18c602e

hcho3 commented Nov 11, 2021

View reviewed changes

python/cuml/test/explainer/test_gpu_treeshap.py Outdated Show resolved Hide resolved

wphicks reviewed Nov 15, 2021

View reviewed changes

python/cuml/explainer/tree_shap.pyx Outdated Show resolved Hide resolved

wphicks requested changes Nov 16, 2021

View reviewed changes

python/cuml/test/explainer/test_gpu_treeshap.py Outdated Show resolved Hide resolved

python/cuml/test/explainer/test_gpu_treeshap.py Outdated Show resolved Hide resolved

python/cuml/test/explainer/test_gpu_treeshap.py Outdated Show resolved Hide resolved

wphicks requested changes Nov 16, 2021

View reviewed changes

hcho3 added 2 commits November 16, 2021 20:48

Remove debugging print

91ee35d

Handle cuML RF with <= condition in tests

36cc8e7

hcho3 changed the title ~~[FEA] Add GPUTreeSHAP to cuML explainer module [WIP]~~ [FEA] Add GPUTreeSHAP to cuML explainer module Nov 16, 2021

hcho3 marked this pull request as ready for review November 16, 2021 22:48

hcho3 requested review from a team as code owners November 16, 2021 22:48

Fix typo in test

fb74617

hcho3 added 4 commits November 16, 2021 23:19

Address reviewer's comment

7ab8799

New design: RAII, type erasure via inheritance, smart ptr

5fd654c

Style fix

29b4d91

Address reviewer's comment

01427fd

hcho3 added 3 - Ready for Review Ready for review by team Experimental Used to denote experimental features feature request New feature or request labels Nov 17, 2021

hcho3 added the non-breaking Non-breaking change label Nov 17, 2021

Move TreeExplainer to experimental module

5364ae6

RAMitchell reviewed Nov 17, 2021

View reviewed changes

wphicks requested changes Nov 17, 2021

View reviewed changes

hcho3 changed the title ~~[FEA] Add GPUTreeSHAP to cuML explainer module~~ [FEA] Add GPUTreeSHAP to cuML explainer module (experimental) Nov 17, 2021

hcho3 added 11 commits November 17, 2021 23:07

Reduce test size; use small synthetic data

cf4e279

Keep output on GPU if input is on GPU

44af5c9

Store bias term in self.expected_value

d964ab6

Respond to C++ review

2302491

Ensure that cuDF and cuPy array can be used as input

eeb920d

Add a test for cuML RF classifier

7b4ca1b

Handle multiple model types in __init__

0a3364b

Update tree_shap.pyx

5d444ac

Update test_gpu_treeshap.py

49ee46f

Clarify error message

0b2a698

Test some degenerate cases

56d62d3

RAMitchell approved these changes Nov 18, 2021

View reviewed changes

wphicks approved these changes Nov 18, 2021

View reviewed changes

dantegd approved these changes Nov 18, 2021

View reviewed changes

rapids-bot bot merged commit 3cf9778 into rapidsai:branch-21.12 Nov 18, 2021

hcho3 deleted the gputreeshap branch November 19, 2021 06:11

hcho3 mentioned this pull request Dec 4, 2021

Break up path processing logic in TreeSHAP explainer #4423

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add GPUTreeSHAP to cuML explainer module (experimental) #4351

[FEA] Add GPUTreeSHAP to cuML explainer module (experimental) #4351

hcho3 commented Nov 10, 2021 •

edited

Loading

wphicks commented Nov 16, 2021

wphicks commented Nov 16, 2021

wphicks Nov 16, 2021

hcho3 commented Nov 16, 2021

wphicks Nov 17, 2021

codecov-commenter commented Nov 18, 2021

wphicks commented Nov 18, 2021

[FEA] Add GPUTreeSHAP to cuML explainer module (experimental) #4351

[FEA] Add GPUTreeSHAP to cuML explainer module (experimental) #4351

Conversation

hcho3 commented Nov 10, 2021 • edited Loading

wphicks commented Nov 16, 2021

wphicks commented Nov 16, 2021

wphicks Nov 16, 2021

Choose a reason for hiding this comment

hcho3 commented Nov 16, 2021

wphicks Nov 17, 2021

Choose a reason for hiding this comment

codecov-commenter commented Nov 18, 2021

Codecov Report

wphicks commented Nov 18, 2021

hcho3 commented Nov 10, 2021 •

edited

Loading