[python-package] Use scikit-learn interpretation of negative `n_jobs` and change default to number of cores #5105

david-cortes · 2022-03-30T15:26:05Z

The scikit-learn interface for lightgbm has a parameter n_jobs with a default value of -1, which is interpreted as using the default number of OMP threads.

According to the scikit-learn glossary, a negative n_jobs has a different meaning (uses joblib's formula), so -1 means to use all available threads:
https://scikit-learn.org/stable/glossary.html#term-n_jobs

This PR changes the meaning of negative n_jobs to match that of scikit-learn, which I think is how most people would expect scikit-learn compatible libraries to behave.

I'm not sure what kind of requirements are there for the tests that are run on Python. This change could be tested, but it would imply fitting models using multiple threads per function call - not sure if there's any issue with that like in the R interface.

python-package/lightgbm/sklearn.py

StrikerRUS · 2022-03-30T15:47:30Z

@david-cortes Thanks a lot for this PR! I support changing default value of n_jobs to match scikit-learn behavior.

I'm not sure what kind of requirements are there for the tests that are run on Python.

I think we can at least check transformed value of nthreads that reaches cpp side. Refer to #4972 (comment).

david-cortes · 2022-03-30T16:02:07Z

@StrikerRUS What would you think of changing the default to the number of physical cores instead?

StrikerRUS · 2022-03-30T16:14:52Z

What would you think of changing the default to the number of physical cores instead?

TBH, I'm not sure. I see some inconsistency. scikit-learn in their glossary says about CPUs (e.g. "If set to -1, all CPUs are used") but under the hood joblib detects number of threads (according to default only_physical_cores=False)
https://github.com/scikit-learn/scikit-learn/blob/582fa30a31ffd1d2afc6325ec3506418e35b88c2/sklearn/ensemble/_forest.py#L444-L454

david-cortes · 2022-03-30T16:16:33Z

I'll BTW point out that the linter is complaining about the order of the imports. According to the docs here, the python code should be PEP8-compliant save for a few exceptions. What this linter is complaining about is outside the scope of PEP8 and not mentioned in the docs. I'm also not sure what would be the right order of imports according to the linter (the logs don't mention which linter is complaining about it).

david-cortes · 2022-03-30T16:20:13Z

What would you think of changing the default to the number of physical cores instead?

TBH, I'm not sure. I see some inconsistency. scikit-learn in their glossary says about CPUs (e.g. "If set to -1, all CPUs are used") but under the hood joblib detects number of threads (according to default only_physical_cores=False) https://github.com/scikit-learn/scikit-learn/blob/582fa30a31ffd1d2afc6325ec3506418e35b88c2/sklearn/ensemble/_forest.py#L444-L454

I mean, what would you think about setting the default value to the output returned from joblib.cpu_count(only_physical_cores=True) (while still interpreting negative numbers the same way as scikit-learn).

jmoralez · 2022-03-30T16:30:50Z

I'll BTW point out that the linter is complaining about the order of the imports.

That refers to what is defined here.

Imports should be grouped in the following order:
Standard library imports.
Related third party imports.
Local application/library specific imports.

So the joblib import should go just above numpy

python-package/lightgbm/sklearn.py

StrikerRUS · 2022-03-30T16:36:16Z

I mean, what would you think about setting the default value to the output returned from joblib.cpu_count(only_physical_cores=True) (while still interpreting negative numbers the same way as scikit-learn).

According to the scikit-learn docs and our own recommendations, we should set only_physical_cores=True. But looks like scikit-learn under the hood uses only_physical_cores=False, so we won't match their behavior ideally.

I'm for using only_physical_cores=True in our package.

david-cortes · 2022-03-30T18:21:24Z

Updated.

david-cortes · 2022-03-30T18:39:35Z

Well, it seems joblib's CPU count function wasn't very robust after all.

tests/python_package_test/test_sklearn.py

david-cortes · 2022-03-31T07:14:26Z

Looks like the failing tests are because of the dask interface having different default arguments from the scikit-learn interface. Not familiar with dask so I'll ask: should the dask interface also default to the same n_jobs and have the same interpretation of negative n_jobs?

python-package/lightgbm/compat.py

python-package/lightgbm/sklearn.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

StrikerRUS · 2022-05-15T20:53:37Z

But I can't imagine any other sources of inconsistency between standard and sklearn interfaces in this PR.

Now that Linux jobs at Azure Pipelines fail always but Windows jobs fail irregularly (32b73a2: all successful; a31fced: Windows regular; 21cd3b2: Windows sdist) I almost sure that the reason is in randomness caused by different number of threads.

StrikerRUS · 2022-05-29T00:17:32Z

Well, the issue of non-deterministic results of the linear tree model given different number of threads isn't related to this PR anyway.

@david-cortes Let's unblock this PR by specifying LightGBM default number of threads (0) here, because in config file that param is commented out:

LightGBM/tests/python_package_test/test_consistency.py

Line 87 in 23da5fc

gbm = lgb.LGBMClassifier(**fd.params)

gbm = lgb.LGBMClassifier(**fd.params, n_jobs=0)

david-cortes · 2022-05-30T21:11:13Z

Well, the issue of non-deterministic results of the linear tree model given different number of threads isn't related to this PR anyway.

@david-cortes Let's unblock this PR by specifying LightGBM default number of threads (0) here, because in config file that param is commented out:

LightGBM/tests/python_package_test/test_consistency.py

Line 87 in 23da5fc

gbm = lgb.LGBMClassifier(**fd.params)
gbm = lgb.LGBMClassifier(**fd.params, n_jobs=0)

Updated, but I think you should open an issue so as not to forget about those differences. There might be some big bug underneath.

StrikerRUS · 2022-06-04T22:47:15Z

Updated, but I think you should open an issue so as not to forget about those differences. There might be some big bug underneath.

Thanks a lot! Sure, will open a separate issue for investigating this.

StrikerRUS · 2022-06-04T22:47:56Z

@jmoralez @jameslamb Will really appreciate your review for this PR.

StrikerRUS

Thank you so much for the hard work on this PR!
LGTM except one minor simplifying the codebase suggestion and using temp folder in tests.

StrikerRUS · 2022-06-05T00:48:34Z

python-package/lightgbm/sklearn.py

+        n_jobs = self.n_jobs
+        for alias in _ConfigAliases.get("num_threads"):
+            if alias in predict_params:
+                n_jobs = predict_params.pop(alias)
+        predict_params["num_threads"] = self._process_n_jobs(n_jobs)


This can be simplified with _choose_param_value():

Suggested change

n_jobs = self.n_jobs

for alias in _ConfigAliases.get("num_threads"):

if alias in predict_params:

n_jobs = predict_params.pop(alias)

predict_params["num_threads"] = self._process_n_jobs(n_jobs)

predict_params = _choose_param_value("num_threads", predict_params, self.n_jobs)

predict_params["num_threads"] = self._process_n_jobs(predict_params["num_threads"])

Problem is, _choose_param_value treats None as if it were not passed, whereas after this PR, None has a special value. For example, one can pass n_jobs=2 in the constructor, then n_jobs=None in the call to predict, and if that function were used, it would not have the desired effect.

Ah, I see now! Thanks for the explanation!

@jameslamb Maybe we can refactor _choose_param_value() function so that it will handle None value of a param correctly?

Ok yeah, I think I understand. Will put up a PR shortly.

created #5289

tests/python_package_test/test_sklearn.py

StrikerRUS · 2022-06-05T01:26:39Z

Sure, will open a separate issue for investigating this.

Created: #5266

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

StrikerRUS · 2022-06-12T16:15:41Z

Kindly ping @jameslamb and @jmoralez for #5105 (comment).

jameslamb

Changes look good to me! I have no additional comments, and it's ok with me if we merge this and then later come back and apply the suggestion from https://github.com/microsoft/LightGBM/pull/5105/files#r889631120 after #5289 is merged.

github-actions · 2023-08-19T03:51:44Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

use joblib formula for negative n_jobs

8bd6e8d

david-cortes requested review from StrikerRUS, shiyu1994, jameslamb, hzy46, tongwu-sh and jmoralez as code owners March 30, 2022 15:26

StrikerRUS reviewed Mar 30, 2022

View reviewed changes

python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved

StrikerRUS added the breaking label Mar 30, 2022

correction for n_jobs calculation

60b28de

david-cortes force-pushed the pynjobs branch from bdc001b to 60b28de Compare March 30, 2022 15:47

use more robust cpu_count from joblib

78e83ef

StrikerRUS reviewed Mar 30, 2022

View reviewed changes

python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved

change default n_jobs to number of cores

8c0fe80

david-cortes changed the title ~~[Python] Use scikit-learn interpretation of negative n_jobs~~ [Python] Use scikit-learn interpretation of negative n_jobs and change default to number of cores Mar 30, 2022

jmoralez reviewed Mar 30, 2022

View reviewed changes

tests/python_package_test/test_sklearn.py Show resolved Hide resolved

fix detection of num_threads under parameters

bde1cd9

david-cortes added 3 commits March 31, 2022 11:13

better handling of n_jobs at prediction time

deb2c4b

fix incorrect usage of list.pop

461b0f0

correct pop/remove yet again

2046bbd

linter

fda1b5e

StrikerRUS reviewed May 15, 2022

View reviewed changes

david-cortes and others added 5 commits May 15, 2022 22:54

Update python-package/lightgbm/compat.py

b4461c3

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Update python-package/lightgbm/sklearn.py

4aec1c6

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Update python-package/lightgbm/sklearn.py

dfe23b9

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Update python-package/lightgbm/sklearn.py

9749f7d

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Update python-package/lightgbm/sklearn.py

21cd3b2

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

david-cortes added 2 commits May 30, 2022 23:09

workaround for passing test about outputs with multiple threads

65247fc

Merge branch 'pynjobs' of github.com:david-cortes/lightgbm into pynjobs

c7259da

StrikerRUS removed the in progress label Jun 4, 2022

StrikerRUS approved these changes Jun 5, 2022

View reviewed changes

StrikerRUS mentioned this pull request Jun 5, 2022

Result of model with linear trees depends on the number of used during fitting CPU threads #5266

Closed

david-cortes and others added 2 commits June 5, 2022 14:41

Update tests/python_package_test/test_sklearn.py

467b74d

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Update tests/python_package_test/test_sklearn.py

d74ab7b

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

jameslamb approved these changes Jun 14, 2022

View reviewed changes

jameslamb changed the title ~~[Python] Use scikit-learn interpretation of negative n_jobs and change default to number of cores~~ [python-package] Use scikit-learn interpretation of negative n_jobs and change default to number of cores Jun 14, 2022

jmoralez approved these changes Jun 14, 2022

View reviewed changes

StrikerRUS merged commit f3ea1ad into microsoft:master Jun 19, 2022

StrikerRUS mentioned this pull request Jun 19, 2022

[python][sklearn] Simplify params handling in predict() with _choose_param_value() #5308

Merged

jameslamb mentioned this pull request Oct 7, 2022

[DO NOT MERGE] Release v3.3.3 #5525

Closed

40 tasks

jameslamb mentioned this pull request Jun 27, 2023

[docs] add versionadded notes for v4.0.0 features #5948

Merged

github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] Use scikit-learn interpretation of negative `n_jobs` and change default to number of cores #5105

[python-package] Use scikit-learn interpretation of negative `n_jobs` and change default to number of cores #5105

david-cortes commented Mar 30, 2022

StrikerRUS commented Mar 30, 2022

david-cortes commented Mar 30, 2022

StrikerRUS commented Mar 30, 2022

david-cortes commented Mar 30, 2022

david-cortes commented Mar 30, 2022 •

edited

Loading

jmoralez commented Mar 30, 2022

StrikerRUS commented Mar 30, 2022

david-cortes commented Mar 30, 2022

david-cortes commented Mar 30, 2022

david-cortes commented Mar 31, 2022

StrikerRUS commented May 15, 2022

StrikerRUS commented May 29, 2022 •

edited

Loading

david-cortes commented May 30, 2022 •

edited

Loading

StrikerRUS commented Jun 4, 2022

StrikerRUS commented Jun 4, 2022

StrikerRUS left a comment

StrikerRUS Jun 5, 2022

david-cortes Jun 5, 2022

StrikerRUS Jun 5, 2022

StrikerRUS Jun 5, 2022

jameslamb Jun 14, 2022

jameslamb Jun 14, 2022

StrikerRUS commented Jun 5, 2022

StrikerRUS commented Jun 12, 2022 •

edited

Loading

jameslamb left a comment •

edited

Loading

github-actions bot commented Aug 19, 2023

[python-package] Use scikit-learn interpretation of negative n_jobs and change default to number of cores #5105

[python-package] Use scikit-learn interpretation of negative n_jobs and change default to number of cores #5105

Conversation

david-cortes commented Mar 30, 2022

StrikerRUS commented Mar 30, 2022

david-cortes commented Mar 30, 2022

StrikerRUS commented Mar 30, 2022

david-cortes commented Mar 30, 2022

david-cortes commented Mar 30, 2022 • edited Loading

jmoralez commented Mar 30, 2022

StrikerRUS commented Mar 30, 2022

david-cortes commented Mar 30, 2022

david-cortes commented Mar 30, 2022

david-cortes commented Mar 31, 2022

StrikerRUS commented May 15, 2022

StrikerRUS commented May 29, 2022 • edited Loading

david-cortes commented May 30, 2022 • edited Loading

StrikerRUS commented Jun 4, 2022

StrikerRUS commented Jun 4, 2022

StrikerRUS left a comment

Choose a reason for hiding this comment

StrikerRUS Jun 5, 2022

Choose a reason for hiding this comment

david-cortes Jun 5, 2022

Choose a reason for hiding this comment

StrikerRUS Jun 5, 2022

Choose a reason for hiding this comment

StrikerRUS Jun 5, 2022

Choose a reason for hiding this comment

jameslamb Jun 14, 2022

Choose a reason for hiding this comment

jameslamb Jun 14, 2022

Choose a reason for hiding this comment

StrikerRUS commented Jun 5, 2022

StrikerRUS commented Jun 12, 2022 • edited Loading

jameslamb left a comment • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Aug 19, 2023

[python-package] Use scikit-learn interpretation of negative `n_jobs` and change default to number of cores #5105

[python-package] Use scikit-learn interpretation of negative `n_jobs` and change default to number of cores #5105

david-cortes commented Mar 30, 2022 •

edited

Loading

StrikerRUS commented May 29, 2022 •

edited

Loading

david-cortes commented May 30, 2022 •

edited

Loading

StrikerRUS commented Jun 12, 2022 •

edited

Loading

jameslamb left a comment •

edited

Loading