-
Notifications
You must be signed in to change notification settings - Fork 14
[WIP] Make our estimators compatible with scikit-learn #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Currently our |
Codecov Report
@@ Coverage Diff @@
## master #116 +/- ##
==========================================
+ Coverage 57.04% 57.35% +0.30%
==========================================
Files 113 114 +1
Lines 6560 7027 +467
==========================================
+ Hits 3742 4030 +288
- Misses 2818 2997 +179
|
dc1ee9f to
0377ca7
Compare
|
I've adapted FETALinear's init. I'll have to do the same for our other cores and estimators and then take care of the other parts of the scikit-learn estimator interface. |
0377ca7 to
9450337
Compare
|
This is turning out to be a lot more effort than expected. To make it somewhat manageable and reviewable I'll proceed as follows:
|
|
The first part is ready for review: #117 |
9450337 to
031dbc0
Compare
031dbc0 to
21bf823
Compare
|
Rebased, now that #118 is merged. |
45267a0 to
352cba0
Compare
|
I have continued work on top of #119 to avoid merge conflicts. We're getting very close to passing the I already started by removing all default values for the regularizer parameters (82a2869). @kiudee do you think that is the right call for the regularizers as well? I had to patch scikit-learn to get more useful/actionable error messages in the test. I'll try to upstream those improvements once scikit-learn/scikit-learn#17756 is resolved. |
352cba0 to
e7e5b0c
Compare
e7e5b0c to
ceeca54
Compare
|
All our estimators are default constructible now and the test passes with scikit-learn/scikit-learn#17936 🎉 That means that we cannot add the test before the PR is merged and a new sklearn release is out though. |
|
The PR was merged upstream. I'm not sure if it'll make it into 0.23.3 or if we have to wait for 0.24. Either way, there should be a stable version with the necessary patch at some point. |
d214673 to
20a757e
Compare
4c637b5 to
b43cb3e
Compare
|
Good news and bad news. Good news: After #159, we are now compliant with "no attributes in init" 🎉 Bad news: Basically all other checks are unusable as long as our |
|
Just to re-state the problem (we talked about this in some other PR/issue, but I can't remember which): According to scikit-learn, It should be possible to flatten each ranking instance into a single feature array to please scikit-learn. We could write functions to transform between the two formats. The question is if that is what we want. |
🥳
I think that (writing converters and having flat format be default) would compromise a bit too much. Our setting is a different one and it would not make sense to press it into that format. It would only buy us compatibility with a few preprocessors (let me know if I forgot an important use case), which for our setting likely do not make sense. Anything important we lose? |
|
I tend to agree. I will look into writing a wrapper anyway, which will at least enable us to make use of the remaining checks. |
|
Okay, I have done that. Just for rankers for now, I'll fix the checks there first (of course they don't all pass). I already added some little fixes to this PR. The checks are very slow though, so we'll probably not run them with the regular testsuite. |
0f4ae0d to
9e0f3f4
Compare
The "default constructible" (i.e. no required __init__ parameters) property is the most basic property of a scikit-learn estimator. It is a prerequisite for all other estimator checks.
Required by the scikit-learn estimator API for easier fit-predict chaining.
This reverts commit 2d35098.
9e0f3f4 to
4901119
Compare
4901119 to
0c7ca20
Compare
|
Now that we use poetry & nix, I have included the environment with the patched scikit-learn that is necessary to run these tests. A simple |
Description
This is a work-in-progress of fixing #94. See that issue for motivation and context.
Currently the one added test is failing since FETADiscreteChoice does not yet conform to the interface. The commit is mostly just to get the PR started. I will incrementally fix the estimators and add them to the list of tested estimators.
Collecting and testing all of our estimators will be easier after #115 is done.
How Has This Been Tested?
Does this close/impact existing issues?
Fixes #94.
Types of changes
Checklist: