-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] Use scikit-learn interpretation of negative n_jobs
and change default to number of cores
#5105
Conversation
@david-cortes Thanks a lot for this PR! I support changing default value of
I think we can at least check transformed value of |
@StrikerRUS What would you think of changing the default to the number of physical cores instead? |
TBH, I'm not sure. I see some inconsistency. scikit-learn in their glossary says about CPUs (e.g. "If set to -1, all CPUs are used") but under the hood joblib detects number of threads (according to default |
I'll BTW point out that the linter is complaining about the order of the imports. According to the docs here, the python code should be PEP8-compliant save for a few exceptions. What this linter is complaining about is outside the scope of PEP8 and not mentioned in the docs. I'm also not sure what would be the right order of imports according to the linter (the logs don't mention which linter is complaining about it). |
I mean, what would you think about setting the default value to the output returned from |
That refers to what is defined here.
So the |
According to the scikit-learn docs and our own recommendations, we should set I'm for using |
n_jobs
n_jobs
and change default to number of cores
Updated. |
Well, it seems joblib's CPU count function wasn't very robust after all. |
Looks like the failing tests are because of the dask interface having different default arguments from the scikit-learn interface. Not familiar with dask so I'll ask: should the dask interface also default to the same |
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Now that Linux jobs at Azure Pipelines fail always but Windows jobs fail irregularly (32b73a2: all successful; a31fced: Windows regular; 21cd3b2: Windows sdist) I almost sure that the reason is in randomness caused by different number of threads. |
Well, the issue of non-deterministic results of the linear tree model given different number of threads isn't related to this PR anyway. @david-cortes Let's unblock this PR by specifying LightGBM default number of threads (0) here, because in config file that param is commented out:
gbm = lgb.LGBMClassifier(**fd.params, n_jobs=0) |
Updated, but I think you should open an issue so as not to forget about those differences. There might be some big bug underneath. |
Thanks a lot! Sure, will open a separate issue for investigating this. |
@jmoralez @jameslamb Will really appreciate your review for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the hard work on this PR!
LGTM except one minor simplifying the codebase suggestion and using temp folder in tests.
n_jobs = self.n_jobs | ||
for alias in _ConfigAliases.get("num_threads"): | ||
if alias in predict_params: | ||
n_jobs = predict_params.pop(alias) | ||
predict_params["num_threads"] = self._process_n_jobs(n_jobs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified with _choose_param_value()
:
n_jobs = self.n_jobs | |
for alias in _ConfigAliases.get("num_threads"): | |
if alias in predict_params: | |
n_jobs = predict_params.pop(alias) | |
predict_params["num_threads"] = self._process_n_jobs(n_jobs) | |
predict_params = _choose_param_value("num_threads", predict_params, self.n_jobs) | |
predict_params["num_threads"] = self._process_n_jobs(predict_params["num_threads"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem is, _choose_param_value
treats None
as if it were not passed, whereas after this PR, None
has a special value. For example, one can pass n_jobs=2
in the constructor, then n_jobs=None
in the call to predict
, and if that function were used, it would not have the desired effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see now! Thanks for the explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb Maybe we can refactor _choose_param_value()
function so that it will handle None
value of a param correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yeah, I think I understand. Will put up a PR shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created #5289
Created: #5266 |
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Kindly ping @jameslamb and @jmoralez for #5105 (comment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me! I have no additional comments, and it's ok with me if we merge this and then later come back and apply the suggestion from https://github.com/microsoft/LightGBM/pull/5105/files#r889631120 after #5289 is merged.
n_jobs
and change default to number of coresn_jobs
and change default to number of cores
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
The scikit-learn interface for lightgbm has a parameter
n_jobs
with a default value of-1
, which is interpreted as using the default number of OMP threads.According to the scikit-learn glossary, a negative
n_jobs
has a different meaning (uses joblib's formula), so-1
means to use all available threads:https://scikit-learn.org/stable/glossary.html#term-n_jobs
This PR changes the meaning of negative
n_jobs
to match that of scikit-learn, which I think is how most people would expect scikit-learn compatible libraries to behave.I'm not sure what kind of requirements are there for the tests that are run on Python. This change could be tested, but it would imply fitting models using multiple threads per function call - not sure if there's any issue with that like in the R interface.