-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'nrounds' as an alias for 'num_iterations' (fixes #4743) #4746
Conversation
@mikemahoney218 Thanks a lot for your contribution! Sorry for the inconvenience, but could you please sync this PR with the latest |
@StrikerRUS No problem, should be up-to-date now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding new alias!
Generally LGTM, except one comment about asserting results in the test. But I'll defer the final decision to @jameslamb.
expect_equal(param_bst$best_score | ||
, top_level_bst$best_score | ||
, tolerance = TOLERANCE) | ||
|
||
expect_equal(param_bst$current_iter(), both_customized$current_iter()) | ||
expect_equal(param_bst$best_score | ||
, both_customized$best_score | ||
, tolerance = TOLERANCE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should check here for the exact equality without any tolerance
. Given the fixed environment and the same set of parameters, consecutive calls of lightgbm should produce bit-by-bit identical results. Just like here:
LightGBM/tests/python_package_test/test_basic.py
Lines 66 to 67 in d62378b
# we need to check the consistency of model file here, so test for exact equal | |
np.testing.assert_array_equal(pred_from_matr, pred_from_model_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @StrikerRUS . And just to be clear, that would mean using testthat::expect_identical()
, not just removing tolerance
.
Since expect_equal()
already allows for small differences by default: https://rdrr.io/cran/testthat/man/equality-expectations.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I swapped this to expect_identical
. When running interactively, this test was non-deterministic; it would sometimes fail with message top_level_l2 not identical to params_l2
. Objects equal but not identical.` and sometimes succeed. My attempts to capture this using reprex succeeded every time, so I'm not sure if something about the interactive environment, R Studio, or (most likely) the way I was running tests was causing the issue, but I figured I'd flag all the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for picking this up! Changes look good to me, but I left a few requests on the additional unit tests.
expect_equal(param_bst$best_score | ||
, top_level_bst$best_score | ||
, tolerance = TOLERANCE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think best_score
is the right thing to use here. For any LightGBM training where validation data isn't provided, this will always be NA
.
library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
nrounds <- 15L
top_level_bst <- lightgbm(
data = train$data
, label = train$label
, nrounds = nrounds
, params = list(
objective = "regression"
, metric = "l2"
, num_leaves = 5L
)
, save_name = tempfile(fileext = ".model")
)
top_level_bst$best_score
# [1] NA
To test "the model produced is identical", to be more confident that the value of num_iterations
was passed through, you can use $eval_train()
. This creates predictions on the training data, using the full model, and then provides evaluation metrics based on those predictions.
top_level_l2 <- top_level_bst$eval_train()[[1L]][["value"]]
params_l2 <- params_bst$eval_train()[[1L]][["value"]]
# check type just to be sure the subsetting didn't return a NULL
expect_true(is.numeric(top_level_2))
expect_true(is.numeric(params_l2))
# check that model produces identical performance
expect_equal(top_level_l2, params_l2)
Could you make this change?
expect_equal(param_bst$best_score | ||
, top_level_bst$best_score | ||
, tolerance = TOLERANCE) | ||
|
||
expect_equal(param_bst$current_iter(), both_customized$current_iter()) | ||
expect_equal(param_bst$best_score | ||
, both_customized$best_score | ||
, tolerance = TOLERANCE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @StrikerRUS . And just to be clear, that would mean using testthat::expect_identical()
, not just removing tolerance
.
Since expect_equal()
already allows for small differences by default: https://rdrr.io/cran/testthat/man/equality-expectations.html
I believe I've added commits that addressed all your comments, namely:
I want to mention that I'm logging off for the weekend around now, so if you all have any other changes I'll look into them come Monday. Have a good weekend 😄 |
Thanks very much for the help! We may leave some additional comments here but no need to respond to them over the weekend. This is not urgent, and we really appreciate you taking the time so far to help out! I'll try running the tests you've added on my machine and see if I also see nondeterministic behavior, thanks very much for flagging that. |
Kindly ping @mikemahoney218 |
@StrikerRUS I believe that I'm waiting for you all to review the new changes 😄 At any rate, I'm not aware of any changes that need to be made on my end right now. |
@mikemahoney218 Ah, sorry, missed your the most recent commits! Then please fix the following linting errors:
https://github.com/microsoft/LightGBM/runs/4051576042?check_suite_focus=true#step:3:790 |
@StrikerRUS We'll see what CI thinks but should be fixed in 0070402 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes all look good to me, thanks very much!!
@mikemahoney218 @StrikerRUS @jameslamb Thank you all for the excellent work. |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
This PR addresses #4743, adding
nrounds
as an alias fornum_iterations
so that (within the R package) the top-levelnrounds
parameter may also be passed to theparams
argument oflightgbm
andlgb.train
.I followed the template of #4637 and the new tests work on my machine (:tm:). Please let me know if I missed anything!