Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add GPUTreeSHAP to cuML explainer module (experimental) #4351

Merged
merged 24 commits into from
Nov 18, 2021

Conversation

hcho3
Copy link
Contributor

@hcho3 hcho3 commented Nov 10, 2021

Addresses #4110

This is an experimental prototype. For now, it supports:

  • XGBoost models with numerical splits
  • cuML RF regressors with numerical splits

cuML RF classifiers are not supported.

@github-actions github-actions bot added CMake CUDA/C++ Cython / Python Cython or Python issue labels Nov 10, 2021
@wphicks
Copy link
Contributor

wphicks commented Nov 16, 2021

Locally, I'm getting RF tests outside of the set tolerance:

________________________________________ test_cuml_rf_classifier ________________________________________

    def test_cuml_rf_classifier():
        X, y = fetch_california_housing(return_X_y=True)
        X, y = X.astype(np.float32), y.astype(np.float32)
        cuml_model = curfr(max_features=1.0, max_samples=0.1, n_bins=128,
                           min_samples_leaf=2, random_state=123,
                           n_streams=1, n_estimators=10, max_leaves=-1,
                           max_depth=16, accuracy_metric="mse")
        cuml_model.fit(X, y)
        pred = cuml_model.predict(X)
        tl_model = cuml_model.convert_to_treelite_model()
    
        explainer = TreeExplainer(model=tl_model)
        out = explainer.shap_values(X)
>       np.testing.assert_almost_equal(np.sum(out, axis=1), pred, decimal=3)
E       AssertionError: 
E       Arrays are not almost equal to 3 decimals
E       
E       Mismatched elements: 8457 / 20640 (41%)
E       Max absolute difference: 1.1803854
E       Max relative difference: 0.36403635
E        x: array([4.086, 3.846, 4.045, ..., 0.858, 0.856, 1.061], dtype=float32)
E        y: array([4.055, 3.946, 4.045, ..., 0.866, 0.856, 1.061], dtype=float32)

@wphicks
Copy link
Contributor

wphicks commented Nov 16, 2021

Should we put this in the experimental namespace for 21.12?


template <typename ThresholdType, typename LeafType>
ExtractedPathHandle
extract_paths_impl(const tl::ModelImpl<ThresholdType, LeafType>& model) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of this function seems generally correct, but it's also a bit hard to follow. Could we break it up a bit or consider using algorithms directly on the underlying containers where feasible rather than trying to keep track of all the separate indexes?

@hcho3 hcho3 changed the title [FEA] Add GPUTreeSHAP to cuML explainer module [WIP] [FEA] Add GPUTreeSHAP to cuML explainer module Nov 16, 2021
@hcho3 hcho3 marked this pull request as ready for review November 16, 2021 22:48
@hcho3 hcho3 requested review from a team as code owners November 16, 2021 22:48
@hcho3
Copy link
Contributor Author

hcho3 commented Nov 16, 2021

I fixed the failing unit test. Marking this PR as non-draft, since it's functionally complete. I'll now address stylistic concerns.

@hcho3 hcho3 added 3 - Ready for Review Ready for review by team Experimental Used to denote experimental features feature request New feature or request labels Nov 17, 2021
@hcho3 hcho3 added the non-breaking Non-breaking change label Nov 17, 2021
ExtractedPathImpl<ThresholdType>* paths
= dynamic_cast<ExtractedPathImpl<ThresholdType>*>(path_container.get());
std::unique_ptr<TreePathInfo> path_info_ptr = std::make_unique<TreePathInfoImpl<ThresholdType>>();
auto* path_info = dynamic_cast<TreePathInfoImpl<ThresholdType>*>(path_info_ptr.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to avoid the dynamic_cast here (and the general complexity of the inner implementation classes for different types) by using a single struct with a std::variant member variable, but let's just make note of that and save it for a follow-on.

@hcho3 hcho3 changed the title [FEA] Add GPUTreeSHAP to cuML explainer module [FEA] Add GPUTreeSHAP to cuML explainer module (experimental) Nov 17, 2021
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.12@34f7929). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.12    #4351   +/-   ##
===============================================
  Coverage                ?   85.65%           
===============================================
  Files                   ?      236           
  Lines                   ?    19179           
  Branches                ?        0           
===============================================
  Hits                    ?    16427           
  Misses                  ?     2752           
  Partials                ?        0           
Flag Coverage Δ
dask 46.17% <0.00%> (?)
non-dask 78.49% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34f7929...56d62d3. Read the comment docs.

@wphicks
Copy link
Contributor

wphicks commented Nov 18, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 3cf9778 into rapidsai:branch-21.12 Nov 18, 2021
@hcho3 hcho3 deleted the gputreeshap branch November 19, 2021 06:11
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
Addresses rapidsai#4110

This is an experimental prototype. For now, it supports:
* XGBoost models with numerical splits
* cuML RF regressors with numerical splits

cuML RF classifiers are not supported.

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - Rory Mitchell (https://github.com/RAMitchell)
  - William Hicks (https://github.com/wphicks)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4351
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CUDA/C++ Cython / Python Cython or Python issue Experimental Used to denote experimental features feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants