-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix percentile computation for regression objectives #5848
Conversation
@shiyu1994 @guolinke can you review this please? |
Maybe still need a (1.0 - alpha)
This comment was marked as spam.
This comment was marked as spam.
sorry for missing this PR. I think this PR indeed fixes the bug. |
@zachary62 please merge latest In the future, I recommend creating a new branch on your fork when contributing to open source projects here on GitHub, instead of committing to |
It seems we need to fix some tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just renamed this. Pull request titles automatically become release note items in this project. Users of LightGBM reading the release notes won't care that the fix was "simple" or how many lines the change was.
I have a few questions for you @guolinke @shiyu1994
- Is my reworded title correct?
- If it is... should this be labeled
breaking
? If I understand correctly, won't this change the results of any models using regression objectives? (as indicated by some of those breaking tests ) - Are you sure this fix is correct? I understand that the initial score being chosen right now is not exactly the median and is supposed to be...but the failing tests are showing that as of this change
lightgbm
produces a worse in-sample fit. Isn't that a problem?
e.g.
LightGBM/tests/python_package_test/test_engine.py
Lines 135 to 146 in 5c9e61d
ret = mean_squared_error(y_test, gbm.predict(X_test)) | |
if objective == 'huber': | |
assert ret < 430 | |
elif objective == 'fair': | |
assert ret < 296 | |
elif objective == 'poisson': | |
assert ret < 193 | |
elif objective == 'quantile': | |
assert ret < 1311 | |
else: | |
assert ret < 338 | |
assert evals_result['valid_0']['l2'][-1] == pytest.approx(ret) |
FAILED tests/python_package_test/test_engine.py::test_regression[regression_l1] - assert 368.9313884458552 < 338
FAILED tests/python_package_test/test_engine.py::test_regression[quantile] - assert 1314.2383346017639 < 1311
I have pushed a fix for the positions in percentile computation. Also, I increase the threshold value for l1 objective in the To show that the fix makes sense, the MAE on the training set of |
Ok thanks @shiyu1994 . I've dismissed my review so it doesn't prevent merging. For my own understanding, can you explain what was incorrect about my proposed title Is it not true that this affects the initial score that is set at the beginning of boosting? I thought that's what was happening based on this log line in #5847:
|
Yes, it is true. But not only the initial score. In LightGBM/src/objective/regression_objective.hpp Lines 253 to 283 in 5c9e61d
|
ohhhh I see. Ok thanks for explaining that! |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Fixes #5847