Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] max_bin parameter deprecated #1046

Merged
merged 6 commits into from
Nov 13, 2017
Merged

[python] max_bin parameter deprecated #1046

merged 6 commits into from
Nov 13, 2017

Conversation

StrikerRUS
Copy link
Collaborator

Refer to this thread.

@StrikerRUS StrikerRUS requested review from wxchan and guolinke November 9, 2017 21:36
@@ -638,6 +639,8 @@ def _lazy_init(self, data, label=None, max_bin=None, reference=None,
self.predictor = predictor
if self.max_bin is not None:
params["max_bin"] = self.max_bin
warnings.warn('The `max_bin` parameter is deprecated and will be removed in next version. '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think next may not be clear. Can we give a specific version number?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wxchan Good idea! What about 2.0.12?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StrikerRUS ok. but we might jump some version. @guolinke can confirm this.

@guolinke
Copy link
Collaborator

@StrikerRUS we can remove all function arguments in native python interface.
However, it seems is not possible for sklearn interface.
Thus, how should we deal the parameter priority for the function arguments and parameter dict ?

@wxchan
Copy link
Contributor

wxchan commented Nov 10, 2017

@guolinke another topic: do you think we can merge booster and _InnerPredictor into one class?

@guolinke
Copy link
Collaborator

@wxchan They are designed for decoupling.
If merge booster and predictor, the dataset and booster will have circle reference, since Dataset needs the predictor for the continued train.

@wxchan
Copy link
Contributor

wxchan commented Nov 10, 2017

@guolinke maybe we don't assign predictor to dataset? We can store it in Booster because there must be a new Booster when continue training.

@wxchan
Copy link
Contributor

wxchan commented Nov 10, 2017

@StrikerRUS I think we can deprecate feature name and category = 'auto', because we don't support category keeping raw input anyway (it must be transferred to int). The extracted feature names always have lowest priority in this case.

@guolinke
Copy link
Collaborator

@wxchan it can be. However, you will need to solve the reference dataset problem.

@StrikerRUS
Copy link
Collaborator Author

@guolinke

we can remove all function arguments in native python interface.

I think it's not necessary, because in most other cases None means no use or something similar.
Do you mean something like Dataset(data, params)?

However, it seems is not possible for sklearn interface.

I think it's more not needed than not possible.

Thus, how should we deal the parameter priority for the function arguments and parameter dict ?

In my opinion function arguments have higher priority.

@wxchan I believe you, but I'll read the code about categorical more deeper and will answer you.

@guolinke
Copy link
Collaborator

@StrikerRUS
the problem of function arguments have higher priority is, there are default values(not None) for the function argument. As a result, the parameter in dict will be ignored even we didn't set the function arguments.

@StrikerRUS
Copy link
Collaborator Author

@guolinke Yeah, I understand it and think that it's expected behavior. Users must read function signature and params has description that it is other parameters. I suppose, duplicated params in dict should be ignored in any way.

Removing all function arguments will confusing: there are dozens of parameters and it will be very hard for new users to find needed among them. Function arguments is the minimum set of params for which users should pay attention firstly.

@guolinke
Copy link
Collaborator

@StrikerRUS okay, however, we should give a warning when met the duplicate.

@guolinke guolinke closed this Nov 10, 2017
@guolinke guolinke reopened this Nov 10, 2017
@StrikerRUS
Copy link
Collaborator Author

@guolinke Yeah, I agree!

@StrikerRUS
Copy link
Collaborator Author

@wxchan

I think we can deprecate feature name and category = 'auto', because we don't support category keeping raw input anyway (it must be transferred to int). The extracted feature names always have lowest priority in this case.

Do you mean that auto actually the same as None?

@StrikerRUS
Copy link
Collaborator Author

We need to replace """some comment""" with # some comment in order to prevent such things:

image

@wxchan
Copy link
Contributor

wxchan commented Nov 11, 2017

The warning in test_consistency is fine, the params are read from file. I think we can set higher stacklevel in warn.

@wxchan
Copy link
Contributor

wxchan commented Nov 11, 2017

The auto is designed for pandas input, it gives user option not to transfer pandas category to int, but actually we don't support the format if not transfer. It results in three possible state: None, auto and some values, make situation too complicated.

@StrikerRUS
Copy link
Collaborator Author

@wxchan I think auto deserves a separate PR.

@wxchan
Copy link
Contributor

wxchan commented Nov 11, 2017

@StrikerRUS sure.

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Nov 11, 2017

@wxchan I'm afraid I don't not understand about stacklevel. From the description I understand that this parameter could hide the part of stack trace but the warning will be raised anyway.
I tried with stacklevel=2:
https://ci.appveyor.com/project/StrikerRUS/lightgbm/build/2.0.10.517/job/b4avrq8g3ja5c3ae#L2148

@wxchan
Copy link
Contributor

wxchan commented Nov 11, 2017

@StrikerRUS it basically means it will show where the function (which causes warning) be called. You can choose a reasonable value you like or leave it default, I tried it before, it looks odd anyway.

@StrikerRUS
Copy link
Collaborator Author

@wxchan I think it's better to leave it default.

@StrikerRUS StrikerRUS closed this Nov 12, 2017
@StrikerRUS StrikerRUS reopened this Nov 12, 2017
@guolinke guolinke merged commit bd5e5e3 into microsoft:master Nov 13, 2017
@StrikerRUS StrikerRUS deleted the params branch November 13, 2017 09:40
@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants