Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'nrounds' as an alias for 'num_iterations' (fixes #4743) #4746

Merged
merged 5 commits into from
Nov 11, 2021
Merged

Add 'nrounds' as an alias for 'num_iterations' (fixes #4743) #4746

merged 5 commits into from
Nov 11, 2021

Conversation

mikemahoney218
Copy link
Contributor

This PR addresses #4743, adding nrounds as an alias for num_iterations so that (within the R package) the top-level nrounds parameter may also be passed to the params argument of lightgbm and lgb.train.

I followed the template of #4637 and the new tests work on my machine (:tm:). Please let me know if I missed anything!

@ghost
Copy link

ghost commented Oct 29, 2021

CLA assistant check
All CLA requirements met.

@StrikerRUS
Copy link
Collaborator

@mikemahoney218 Thanks a lot for your contribution! Sorry for the inconvenience, but could you please sync this PR with the latest master branch to fix CI issues?

@mikemahoney218
Copy link
Contributor Author

@StrikerRUS No problem, should be up-to-date now.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding new alias!
Generally LGTM, except one comment about asserting results in the test. But I'll defer the final decision to @jameslamb.

Comment on lines 235 to 242
expect_equal(param_bst$best_score
, top_level_bst$best_score
, tolerance = TOLERANCE)

expect_equal(param_bst$current_iter(), both_customized$current_iter())
expect_equal(param_bst$best_score
, both_customized$best_score
, tolerance = TOLERANCE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should check here for the exact equality without any tolerance. Given the fixed environment and the same set of parameters, consecutive calls of lightgbm should produce bit-by-bit identical results. Just like here:

# we need to check the consistency of model file here, so test for exact equal
np.testing.assert_array_equal(pred_from_matr, pred_from_model_file)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @StrikerRUS . And just to be clear, that would mean using testthat::expect_identical(), not just removing tolerance.

Since expect_equal() already allows for small differences by default: https://rdrr.io/cran/testthat/man/equality-expectations.html

Copy link
Contributor Author

@mikemahoney218 mikemahoney218 Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swapped this to expect_identical. When running interactively, this test was non-deterministic; it would sometimes fail with message top_level_l2 not identical to params_l2. Objects equal but not identical.` and sometimes succeed. My attempts to capture this using reprex succeeded every time, so I'm not sure if something about the interactive environment, R Studio, or (most likely) the way I was running tests was causing the issue, but I figured I'd flag all the same.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for picking this up! Changes look good to me, but I left a few requests on the additional unit tests.

Comment on lines 235 to 237
expect_equal(param_bst$best_score
, top_level_bst$best_score
, tolerance = TOLERANCE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think best_score is the right thing to use here. For any LightGBM training where validation data isn't provided, this will always be NA.

library(lightgbm)

data(agaricus.train, package = "lightgbm")
train <- agaricus.train

nrounds <- 15L

top_level_bst <- lightgbm(
    data = train$data
    , label = train$label
    , nrounds = nrounds
    , params = list(
      objective = "regression"
      , metric = "l2"
      , num_leaves = 5L
    )
    , save_name = tempfile(fileext = ".model")
)

top_level_bst$best_score
# [1] NA

To test "the model produced is identical", to be more confident that the value of num_iterations was passed through, you can use $eval_train(). This creates predictions on the training data, using the full model, and then provides evaluation metrics based on those predictions.

top_level_l2 <- top_level_bst$eval_train()[[1L]][["value"]]
params_l2    <- params_bst$eval_train()[[1L]][["value"]]

# check type just to be sure the subsetting didn't return a NULL
expect_true(is.numeric(top_level_2))
expect_true(is.numeric(params_l2))

# check that model produces identical performance
expect_equal(top_level_l2, params_l2)

Could you make this change?

Comment on lines 235 to 242
expect_equal(param_bst$best_score
, top_level_bst$best_score
, tolerance = TOLERANCE)

expect_equal(param_bst$current_iter(), both_customized$current_iter())
expect_equal(param_bst$best_score
, both_customized$best_score
, tolerance = TOLERANCE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @StrikerRUS . And just to be clear, that would mean using testthat::expect_identical(), not just removing tolerance.

Since expect_equal() already allows for small differences by default: https://rdrr.io/cran/testthat/man/equality-expectations.html

R-package/tests/testthat/test_basic.R Show resolved Hide resolved
@mikemahoney218
Copy link
Contributor Author

I believe I've added commits that addressed all your comments, namely:

  • Use eval_train() rather than best_score()
  • Testing lgb.train() as well as lightgbm()
  • Explicit tests comparing current_iter() to nrounds
  • expect_identical() rather than expect_equal()

I want to mention that expect_identical() often failed when running these tests interactively; I don't know if it was something in how I ran the tests or in how my R Studio is configured that caused this issue. My attempts to capture this via reprex::reprex() always succeeded, so I'm expecting they'll work in a CI environment, but I wanted to flag this anyway.

I'm logging off for the weekend around now, so if you all have any other changes I'll look into them come Monday. Have a good weekend 😄

@jameslamb
Copy link
Collaborator

Thanks very much for the help! We may leave some additional comments here but no need to respond to them over the weekend. This is not urgent, and we really appreciate you taking the time so far to help out!

I'll try running the tests you've added on my machine and see if I also see nondeterministic behavior, thanks very much for flagging that.

@StrikerRUS
Copy link
Collaborator

Kindly ping @mikemahoney218

@mikemahoney218
Copy link
Contributor Author

@StrikerRUS I believe that I'm waiting for you all to review the new changes 😄 At any rate, I'm not aware of any changes that need to be made on my end right now.

@StrikerRUS
Copy link
Collaborator

@mikemahoney218 Ah, sorry, missed your the most recent commits! Then please fix the following linting errors:

[1] Total linting issues found: 10
/home/runner/work/LightGBM/LightGBM/R-package/tests/testthat/test_basic.R:233:1: style: Trailing whitespace is superfluous.
/home/runner/work/LightGBM/LightGBM/R-package/tests/testthat/test_basic.R:242:1: style: Trailing whitespace is superfluous.
/home/runner/work/LightGBM/LightGBM/R-package/tests/testthat/test_basic.R:535:1: style: Trailing whitespace is superfluous.
/home/runner/work/LightGBM/LightGBM/R-package/tests/testthat/test_basic.R:550:1: style: Trailing whitespace is superfluous.
/home/runner/work/LightGBM/LightGBM/R-package/tests/testthat/test_basic.R:565:1: style: Trailing whitespace is superfluous.
/home/runner/work/LightGBM/LightGBM/R-package/tests/testthat/test_basic.R:581:1: style: Trailing whitespace is superfluous.
/home/runner/work/LightGBM/LightGBM/R-package/tests/testthat/test_basic.R:585:1: style: Trailing whitespace is superfluous.
/home/runner/work/LightGBM/LightGBM/R-package/tests/testthat/test_basic.R:590:1: style: Trailing whitespace is superfluous.
/home/runner/work/LightGBM/LightGBM/R-package/tests/testthat/test_basic.R:594:1: style: Trailing whitespace is superfluous.
/home/runner/work/LightGBM/LightGBM/R-package/tests/testthat/test_basic.R:598:1: style: Trailing whitespace is superfluous.

https://github.com/microsoft/LightGBM/runs/4051576042?check_suite_focus=true#step:3:790

@mikemahoney218
Copy link
Contributor Author

@StrikerRUS We'll see what CI thinks but should be fixed in 0070402

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes all look good to me, thanks very much!!

@shiyu1994 shiyu1994 merged commit 3b6ebd7 into microsoft:master Nov 11, 2021
@shiyu1994
Copy link
Collaborator

@mikemahoney218 @StrikerRUS @jameslamb Thank you all for the excellent work.

@StrikerRUS StrikerRUS mentioned this pull request Jan 6, 2022
13 tasks
@jameslamb jameslamb mentioned this pull request Oct 7, 2022
40 tasks
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants