-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] Accept factor labels and use their levels #5341
Conversation
Ok @david-cortes sorry for the VERY long delay! We've had some significant challenges with maintainer availability over the last 9 months. Are you still interested in pursuing this PR? If you are, please update it with the latest changes from |
Solved merge conflicts. |
The failing job is unrelated to this PR: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=14228&view=logs&j=582743ec-83fe-58c4-937e-4197b5816801&t=63b016c8-8965-547a-fab6-395af921c96a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the excellent work 🎉
I appreciate the thorough tests, and really like the implementation you came up with that has a minimal impact on the package's public interface.
I left some suggestions for changes, but overall I'm supportive of the way this is structured.
@simonpcouch @mayer79 do either of you have time to help us with a review?
And of course @jmoralez I'd love your opinions if you have time.
Yeah sorry about that. It's a known flaky job: #5677 (comment). Sorry about that. You can safely ignore any failures in the Azure DevOps |
(retagging @simonpcouch because there was a typo in my review message and i'm not sure if editing an |
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, thank you so much!
And thanks for improving the detection of binary vs. multiclass objectives with .BINARY_OBJECTIVES()
and .MULTICLASS_OBJECTIVES()
, that's helpful even for the cases where label
isn't a factor.
Just one more suggestion (#5341 (comment)), and otherwise I'm good with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these additions are definitely an improvement and am on board for the changes requested in James’ review!
bonsai interacts with the lgb.train()
interface, so (I believe) this won't end up being an issue for us if you decide to ignore, but one comment:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
@jmoralez would you like a chance to review this?
Thanks @jmoralez for helping review this, @david-cortes for the excellent contribution, and @mayer79 and @simonpcouch for always being willing to provide thoughtful reviews when we ask for your help 😊 |
This pull request has been automatically locked since there has not been any recent activity since it was closed. |
ref #4968
Typically, modeling packages for R switch between regression and classification according to what they are passed for the response variable (example packages doing this:
randomForest
,ranger
,e1071
,caret
,gbm
, etc.), and typically, they require classification models to be passedfactor
type labels even if they do not auto-determine the objective (e.g.glm
,glmnet
, etc.) and use the levels of thisfactor
when returning predictions.This PR changes the
lightgbm()
interface to play along with factors:factor
types for the response variable.There is one unresolved issue here however: if the prediction
type
is passed throughparams
, there can be some issues if e.g. a user requests leaf predictions and the number of trees is equal to the number of classes. I left it as a warning note in the documentation but perhaps there's some better workaround that wouldn't involve looking through aliases.