Skip to content

Conversation

@joshdunnlime
Copy link
Contributor

@joshdunnlime joshdunnlime commented Dec 4, 2025

Allows passing of xgb_params to the xgboostlss model when n_trail=0.

Also allows the passthrough of stabilization and response_fn.

Reference Issues/PRs

Fixes #671 #670

What does this implement/fix? Explain your changes.

Not a major change. Just adds a new kwarg xgb_params and passes this through to xgboostlss.

Does your contribution introduce a new dependency? If yes, which one?

No.

What should a reviewer concentrate their feedback on?

Style, test setup and docs.

Did you add any tests for the change?

Yes. I passed extra params to get_test_params and a test_xgboostlss standalone test.

Any other comments?

No.

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

@marrov
Copy link
Member

marrov commented Dec 4, 2025

There are dependency failures and you forgot to mention this PR also fixes #670 - other than that LGTM

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expose them explicitly, or the individual keys in xgb_params cannot be addressed by set_params and, in consequence, sktime or skpro tuners.

@joshdunnlime
Copy link
Contributor Author

@marrov I didn't see that other issue but noticed this when fixing the 2nd one. Thanks - added above for clarity.

The dependency issues are detailed here: #643. The test failures are also referenced in there by @fkiraly - alternative fix required.

I would expose them explicitly, or the individual keys in xgb_params cannot be addressed by set_params and, in consequence, sktime or skpro tuners.

Agreed, this is an issue. However, XGBoostLSS itself doesn't expose this, which is somewhat understandable because of the number of there params that need to be exposed in XGBoost itself: https://github.com/dmlc/xgboost/blob/c3a5e3e66b1e46a191a479b791204a8ccd10ce60/python-package/xgboost/sklearn.py#L236
https://github.com/dmlc/xgboost/blob/c3a5e3e66b1e46a191a479b791204a8ccd10ce60/python-package/xgboost/sklearn.py#L808

What about passing kwargs and collecting them in self.xgb_params? This should work for skpro and possibly sklearn gridsearches. This means the API stays the same if we add explicit kwargs in the future but allows passthrough now.

We can open an issue on XGBoostLSS to explicitly add XGBoost parameters and one here to add XGBoostLSS after that.

@joshdunnlime joshdunnlime requested a review from fkiraly December 5, 2025 00:17
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

  • varargs are not allowed in __init__, please spell all of them out explicitly (and say, only passed if [condition])
  • do not overwrite get_params or set_params please - this is why!

@fkiraly fkiraly added enhancement module:regression probabilistic regression module labels Dec 5, 2025
@fkiraly
Copy link
Collaborator

fkiraly commented Dec 11, 2025

FYI, I merged from main and removed the overrides to set_params etc.

You need to make all individual parameters explicit in __init__ - this is a requirement by the sklearn-like APIs.

(I assume you can just copy-paste from xgboost, also ensure you cover them via get_test_params)

@joshdunnlime
Copy link
Contributor Author

joshdunnlime commented Dec 11, 2025

I saw the xgblss deps fix. I'll pull that.

I have started implementing the explicitly named kwargs fix but there are a lot of potential parameters and several edge-cases that are introduced.

XGBoostLSS leverages the standard XGBoost API, while the sklearn API uses different parameter names for well established sklearn parameters. Should skpro (as an extension of sklearn) adopt the sklearn parameter names or (as an extension of XGBLSS) adopt the default XGBoost parameters?
E.g. n_estimators vs num_boost_round, learning_rate vs eta. I am currently parsing both but might be overkill... Please advise.

Additionally, there was a new skpro n_cpu parameter introduced in the original skpro-XGBLSS implementation. This deviates from the n_jobs of sklearn and nthreads of XGBoost. I think this should be deprecated to conform with one of the others, in favour of n_jobs as it is used elsewhere in skpro.

@joshdunnlime
Copy link
Contributor Author

joshdunnlime commented Dec 12, 2025

I assume the failure is because we need to use _xgb_params for "fitted" prams?

I was passing xgb_params in one of the tests still...

@marrov
Copy link
Member

marrov commented Dec 12, 2025

Small comment: XGBoostLSS just added a new param for the distributions called "initialize". In my project, this seems quite important (and was the default behaviour) but it now defaults to False (not initialize). So, my recommendation would be to add this param to and update the dep boundary for xgboostlss. You can check how I added this in my fork here.

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 12, 2025

@StatMixedML, did you make an API breaking change, or did this get accidentally merged with someone else's pull request?

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 12, 2025

XGBoostLSS leverages the standard XGBoost API, while the sklearn API uses different parameter names for well established sklearn parameters. Should skpro (as an extension of sklearn) adopt the sklearn parameter names or (as an extension of XGBLSS) adopt the default XGBoost parameters?
E.g. n_estimators vs num_boost_round, learning_rate vs eta. I am currently parsing both but might be overkill... Please advise.

I would not mind either way, as long as all the parameters are explicit, and as long as we have consistency between estimators from the @StatMixedML stack.

@joshdunnlime joshdunnlime requested a review from fkiraly December 12, 2025 22:21
Default is "auto" which uses all available CPUs.
Default is "auto" which uses all available CPUs. If an integer is given,
that number of CPUs is used. If both n_cpu and n_jobs are set, an error
is raised. `n_jobs` should be preferred to `n_cpu` for sklearn compatibility.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for proper rendering, code strings should be in double backticks, not single backticks.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Do I understand correctly that some parameters have two names, and only one of the two should be passed? I think this is confusing for the user. I would go with the xgboost (sklearn adapter) name only.

Further, could you please add all parameters explicitly in the docstring? It is a nightmarishly long list, but you should be able to copy paste it from xgboost code base?

@joshdunnlime
Copy link
Contributor Author

joshdunnlime commented Dec 16, 2025

Great!

Do I understand correctly that some parameters have two names, and only one of the two should be passed? I think this is confusing for the user. I would go with the xgboost (sklearn adapter) name only.

Further, could you please add all parameters explicitly in the docstring? It is a nightmarishly long list, but you should be able to copy paste it from xgboost code base?

The problem with XGBoost-sklearn params is that XGBLSS uses xgb.train interanlly, which means parsing them from XGBoost-sklearn specific params to XGB native params anyway (and potentially in a confusing matter if an error is raised referencing eta, when learning_rate was pass - maybe better to document them together?). Going with XGB native params means deviating from sklearn nomenclature. If we are doing some parsing already, my rational was to allow users to explicitly choose their convention.

The acceptance of multiple aliases is fairly well understood within the "boosting" user base. LightGBM also handles many aliases very nicely. It might be worth considering how we handle this ahead of LightGBMLSS being implemented.

In light of this, do you still wish to go with the XGB-sklearn convention, switch to XGB or allow both?

On the docstring, the slight problem here is that params are quite version specific. Most params are stable after 2.0 (except for Lambdarank and gpu specific kwargs). XGBLSS already pins to xgboost>=2.0, so as long as we're comfortable with these params being excluded (or documented and not available), then I can copy-and-paste that over.

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 16, 2025

In light of this, do you still wish to go with the XGB-sklearn convention, switch to XGB or allow both?

If you prefer to allow both, then I would not be against it.

However, I would document parameters differently:

paramname : abcdef, optional, default = 3
paramname 2: alias for paramname
    Description of parameter.
    More details

XGBLSS already pins to xgboost>=2.0, so as long as we're comfortable with these params being excluded (or documented and not available), then I can copy-and-paste that over.

Yes, I think that is sufficient - if xgblss pins xgboost, then you can assume the parameter names are as in xgboost>=2.0.
Or something similar, that avoids user confusion by having multiple blocks.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

I fixed downwards compatibility with 0.6.0 and below, taking care of the initialize parameter

@fkiraly fkiraly changed the title [ENH] passes xgb_params to xgboostlss when n-trail=0 with docs and test [ENH] in XGBoostLSS, expose underlying parameters when n_trail=0 Jan 13, 2026
@joshdunnlime
Copy link
Contributor Author

joshdunnlime commented Jan 13, 2026

Thank for picking this up. It fell off my radar.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 13, 2026

also mine - I was shocked that I had not merged it when @marrov opened the fix for the same issues (because I thought I had), many apologies!

@fkiraly fkiraly merged commit b4dc93f into sktime:main Jan 13, 2026
52 checks passed
marrov pushed a commit to marrov/skpro that referenced this pull request Jan 19, 2026
…ktime#672)

Allows passing of xgb_params to the xgboostlss model when `n_trail=0`.

Also allows the passthrough of `stabilization` and `response_fn`.

#### Reference Issues/PRs

Fixes sktime#671 sktime#670 

#### What does this implement/fix? Explain your changes.

Not a major change. Just adds a new kwarg `xgb_params` and passes this
through to xgboostlss.

#### Does your contribution introduce a new dependency? If yes, which
one?

No.

#### What should a reviewer concentrate their feedback on?

Style, test setup and docs.

#### Did you add any tests for the change?

Yes. I passed extra params to `get_test_params` and a `test_xgboostlss`
standalone test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement module:regression probabilistic regression module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] XGBoostLSS wrapper should expose the underlying XGBoost params

3 participants