-
Notifications
You must be signed in to change notification settings - Fork 76
[ENH] in XGBoostLSS, expose underlying parameters when n_trail=0
#672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] in XGBoostLSS, expose underlying parameters when n_trail=0
#672
Conversation
|
There are dependency failures and you forgot to mention this PR also fixes #670 - other than that LGTM |
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expose them explicitly, or the individual keys in xgb_params cannot be addressed by set_params and, in consequence, sktime or skpro tuners.
|
@marrov I didn't see that other issue but noticed this when fixing the 2nd one. Thanks - added above for clarity. The dependency issues are detailed here: #643. The test failures are also referenced in there by @fkiraly - alternative fix required.
Agreed, this is an issue. However, XGBoostLSS itself doesn't expose this, which is somewhat understandable because of the number of there params that need to be exposed in XGBoost itself: https://github.com/dmlc/xgboost/blob/c3a5e3e66b1e46a191a479b791204a8ccd10ce60/python-package/xgboost/sklearn.py#L236 What about passing kwargs and collecting them in self.xgb_params? This should work for skpro and possibly sklearn gridsearches. This means the API stays the same if we add explicit kwargs in the future but allows passthrough now. We can open an issue on XGBoostLSS to explicitly add XGBoost parameters and one here to add XGBoostLSS after that. |
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
- varargs are not allowed in
__init__, please spell all of them out explicitly (and say, only passed if [condition]) - do not overwrite
get_paramsorset_paramsplease - this is why!
|
FYI, I merged from You need to make all individual parameters explicit in (I assume you can just copy-paste from xgboost, also ensure you cover them via |
|
I saw the xgblss deps fix. I'll pull that. I have started implementing the explicitly named kwargs fix but there are a lot of potential parameters and several edge-cases that are introduced. XGBoostLSS leverages the standard XGBoost API, while the sklearn API uses different parameter names for well established sklearn parameters. Should skpro (as an extension of sklearn) adopt the sklearn parameter names or (as an extension of XGBLSS) adopt the default XGBoost parameters? Additionally, there was a new skpro |
…joshdunnlime/skpro into xgboostlss-no-trail-pass-params
|
I was passing |
|
Small comment: XGBoostLSS just added a new param for the distributions called "initialize". In my project, this seems quite important (and was the default behaviour) but it now defaults to |
|
@StatMixedML, did you make an API breaking change, or did this get accidentally merged with someone else's pull request? |
I would not mind either way, as long as all the parameters are explicit, and as long as we have consistency between estimators from the @StatMixedML stack. |
| Default is "auto" which uses all available CPUs. | ||
| Default is "auto" which uses all available CPUs. If an integer is given, | ||
| that number of CPUs is used. If both n_cpu and n_jobs are set, an error | ||
| is raised. `n_jobs` should be preferred to `n_cpu` for sklearn compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for proper rendering, code strings should be in double backticks, not single backticks.
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Do I understand correctly that some parameters have two names, and only one of the two should be passed? I think this is confusing for the user. I would go with the xgboost (sklearn adapter) name only.
Further, could you please add all parameters explicitly in the docstring? It is a nightmarishly long list, but you should be able to copy paste it from xgboost code base?
The problem with XGBoost-sklearn params is that XGBLSS uses The acceptance of multiple aliases is fairly well understood within the "boosting" user base. LightGBM also handles many aliases very nicely. It might be worth considering how we handle this ahead of LightGBMLSS being implemented. In light of this, do you still wish to go with the XGB-sklearn convention, switch to XGB or allow both? On the docstring, the slight problem here is that params are quite version specific. Most params are stable after 2.0 (except for Lambdarank and gpu specific kwargs). XGBLSS already pins to xgboost>=2.0, so as long as we're comfortable with these params being excluded (or documented and not available), then I can copy-and-paste that over. |
If you prefer to allow both, then I would not be against it. However, I would document parameters differently:
Yes, I think that is sufficient - if xgblss pins xgboost, then you can assume the parameter names are as in |
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
I fixed downwards compatibility with 0.6.0 and below, taking care of the initialize parameter
XGBoostLSS, expose underlying parameters when n_trail=0
|
Thank for picking this up. It fell off my radar. |
|
also mine - I was shocked that I had not merged it when @marrov opened the fix for the same issues (because I thought I had), many apologies! |
…ktime#672) Allows passing of xgb_params to the xgboostlss model when `n_trail=0`. Also allows the passthrough of `stabilization` and `response_fn`. #### Reference Issues/PRs Fixes sktime#671 sktime#670 #### What does this implement/fix? Explain your changes. Not a major change. Just adds a new kwarg `xgb_params` and passes this through to xgboostlss. #### Does your contribution introduce a new dependency? If yes, which one? No. #### What should a reviewer concentrate their feedback on? Style, test setup and docs. #### Did you add any tests for the change? Yes. I passed extra params to `get_test_params` and a `test_xgboostlss` standalone test.
Allows passing of xgb_params to the xgboostlss model when
n_trail=0.Also allows the passthrough of
stabilizationandresponse_fn.Reference Issues/PRs
Fixes #671 #670
What does this implement/fix? Explain your changes.
Not a major change. Just adds a new kwarg
xgb_paramsand passes this through to xgboostlss.Does your contribution introduce a new dependency? If yes, which one?
No.
What should a reviewer concentrate their feedback on?
Style, test setup and docs.
Did you add any tests for the change?
Yes. I passed extra params to
get_test_paramsand atest_xgboostlssstandalone test.Any other comments?
No.
PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
skproroot directory (not theCONTRIBUTORS.md). Common badges:code- fixing a bug, or adding code logic.doc- writing or improving documentation or docstrings.bug- reporting or diagnosing a bug (get this pluscodeif you also fixed the bug in the PR).maintenance- CI, test framework, release.See here for full badge reference
For new estimators
docs/source/api_reference/taskname.rst, follow the pattern.Examplessection.python_dependenciestag and ensureddependency isolation, see the estimator dependencies guide.