-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support categorical splits in in TreeExplainer #4473
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, bar a few minor comments.
This is quite a complicated feature, so I wouldn't be surprised if there are still bugs lurking even with review and good test coverage. We could try to do some stress testing with hypothesis to really increase our confidence, but I'll leave it up to you.
# XGBoost model object | ||
if cls.__module__ == 'xgboost.core' and cls.__name__ == 'Booster': | ||
if re.match(r'xgboost.*$', cls_module) and cls_name == 'Booster': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to see this logic for matching different libraries move into TreeLite. e.g. treelite.Model(model)
- treelite guesses the model type.
Historically, this type of code for guessing module names ends up being fragile over time. Not sure how to avoid it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this type of code for guessing module names ends up being fragile over time. Not sure how to avoid it though.
The alternative is to ask the user to pass a string argument to indicate which model type it is. This will be more typing however.
I don't think we've decided yet to require |
Hypothesis is already in the rapids environment, although this is maybe not the same as us being allowed to use it :) I wrote almost all my tests for #4492 in hypothesis. |
@hcho3 Rory's advice is correct, hypothesis already being used by cuDF, so it is already in our testing environments (and developer envs for that matter), so it should be fine to use in pytests |
@RAMitchell I agree that property-based testing is a good idea. Given that code freeze is in 3 days, let me add hypothesis based testing in a follow-up PR targeting 22.04. |
Good idea, it's not urgent. |
Removing |
@RAMitchell @wphicks Can you take another quick look? I suggest that we merge this as long as there isn't any big blocker. We can address remaining issues in the following release, in which we'd move TreeExplainer out of |
Rerun tests |
rerun tests |
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #4473 +/- ##
===============================================
Coverage ? 85.69%
===============================================
Files ? 236
Lines ? 19338
Branches ? 0
===============================================
Hits ? 16572
Misses ? 2766
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
@RAMitchell will go ahead and merge this PR so that it makes it to release 22.02, @hcho3 could you address any pending comments and further testing in a follow up PR? Thanks! |
@gpucibot merge |
Stacked on top of rapidsai#4447. Do not merge until rapidsai#4447 is merged first. ~If you'd like to review before rapidsai#4447 is merged, look at the net diff at hcho3#2.~ - [x] Test XGBoost models with categorical splits - [x] Test LightGBM models with categorical splits Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4473
Stacked on top of #4447. Do not merge until #4447 is merged first.
If you'd like to review before #4447 is merged, look at the net diff at hcho3#2.