[WIP] Make our estimators compatible with scikit-learn #116

timokau · 2020-04-30T22:03:35Z

Description

This is a work-in-progress of fixing #94. See that issue for motivation and context.

Currently the one added test is failing since FETADiscreteChoice does not yet conform to the interface. The commit is mostly just to get the PR started. I will incrementally fix the estimators and add them to the list of tested estimators.

Collecting and testing all of our estimators will be easier after #115 is done.

How Has This Been Tested?

Does this close/impact existing issues?

Fixes #94.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.

timokau · 2020-04-30T22:10:23Z

Currently our FETANetwork base class requires the mandatory n_objects and n_object_features parameters. According to sklearns guidelines, those should be passed to fit instead. That effectively means that we should only construct the network on fit too. That will probably require quite some refactoring.

codecov · 2020-04-30T22:19:45Z

Codecov Report

Merging #116 into master will increase coverage by 0.30%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #116      +/-   ##
==========================================
+ Coverage   57.04%   57.35%   +0.30%     
==========================================
  Files         113      114       +1     
  Lines        6560     7027     +467     
==========================================
+ Hits         3742     4030     +288     
- Misses       2818     2997     +179

Impacted Files	Coverage Δ
csrank/tests/test_estimators.py	`0.00% <0.00%> (ø)`
csrank/discretechoice/ranknet_discrete_choice.py	`83.78% <0.00%> (-16.22%)`	⬇️
csrank/objectranking/baseline.py	`55.00% <0.00%> (-14.24%)`	⬇️
csrank/discretechoice/baseline.py	`55.00% <0.00%> (-14.24%)`	⬇️
csrank/discretechoice/cmpnet_discrete_choice.py	`86.11% <0.00%> (-13.89%)`	⬇️
csrank/objectranking/cmp_net.py	`86.48% <0.00%> (-13.52%)`	⬇️
csrank/objectranking/rank_net.py	`86.48% <0.00%> (-13.52%)`	⬇️
csrank/choicefunction/ranknet_choice.py	`80.85% <0.00%> (-12.01%)`	⬇️
csrank/core/cmpnet_core.py	`87.50% <0.00%> (-10.21%)`	⬇️
csrank/core/ranknet_core.py	`87.50% <0.00%> (-10.21%)`	⬇️
... and 62 more

timokau · 2020-05-02T15:24:33Z

I've adapted FETALinear's init. I'll have to do the same for our other cores and estimators and then take care of the other parts of the scikit-learn estimator interface.

timokau · 2020-05-12T14:28:20Z

This is turning out to be a lot more effort than expected. To make it somewhat manageable and reviewable I'll proceed as follows:

Gradually make all estimators pass the "default constructible" test. There is some highly repetitive work to be done here. The first part is moving all random state validation out of __init__. I'll split that repetitive work into separate PRs for easier review and faster merge.
Create a new test that calls check_estimator but blacklists all currently failing sub-checks.
Gradually fix the failing sub-checks, save for some which are inherently incompatible with our library

timokau · 2020-05-12T15:05:09Z

The first part is ready for review: #117

timokau · 2020-05-22T19:07:34Z

I have rebased this on top of #118. After #118, the biggest remaining blocker for the default-constructible is the optimizer parameter of our estimators. See #119 for that.

timokau · 2020-05-26T13:37:50Z

Rebased, now that #118 is merged.

timokau · 2020-06-27T17:29:24Z

I have continued work on top of #119 to avoid merge conflicts. We're getting very close to passing the default_constructible test! Besides scikit-learn/scikit-learn#17756, with the fixes in this PR and #119 the only remaining failure is due to the kernel_regularizer parameter. We'll need to give it the same treatment as optimizer.

I already started by removing all default values for the regularizer parameters (82a2869). @kiudee do you think that is the right call for the regularizers as well?

I had to patch scikit-learn to get more useful/actionable error messages in the test. I'll try to upstream those improvements once scikit-learn/scikit-learn#17756 is resolved.

timokau · 2020-07-16T17:28:54Z

All our estimators are default constructible now and the test passes with scikit-learn/scikit-learn#17936 🎉

That means that we cannot add the test before the PR is merged and a new sklearn release is out though.

timokau · 2020-08-07T18:03:04Z

The PR was merged upstream. I'm not sure if it'll make it into 0.23.3 or if we have to wait for 0.24. Either way, there should be a stable version with the necessary patch at some point.

timokau · 2020-09-17T12:23:14Z

Good news and bad news.

Good news: After #159, we are now compliant with "no attributes in init" 🎉

Bad news: Basically all other checks are unusable as long as our fit function doesn't match the data dimensionality that scikit-learn assumes. I have updated this PR and blacklisted all the checks that do not work. So we either have to think again about how we can make our fit function fit (pun intended) or ignore all those checks.

timokau · 2020-09-17T12:29:39Z

Just to re-state the problem (we talked about this in some other PR/issue, but I can't remember which):

According to scikit-learn, fit should expect an X of the shape (n_samples, n_features). However, for us every sample consists of a raking which has multiple objects. Therefore, X has the shape (n_instances, n_objects, n_features).

It should be possible to flatten each ranking instance into a single feature array to please scikit-learn. We could write functions to transform between the two formats. The question is if that is what we want.

CC @kiudee @prithagupta

kiudee · 2020-09-17T14:01:28Z

Good news: After #159, we are now compliant with "no attributes in init" 🎉

🥳

Bad news: Basically all other checks are unusable as long as our fit function doesn't match the data dimensionality that scikit-learn assumes. I have updated this PR and blacklisted all the checks that do not work. So we either have to think again about how we can make our fit function fit (pun intended) or ignore all those checks.

I think that (writing converters and having flat format be default) would compromise a bit too much. Our setting is a different one and it would not make sense to press it into that format. It would only buy us compatibility with a few preprocessors (let me know if I forgot an important use case), which for our setting likely do not make sense. Anything important we lose?

timokau · 2020-09-18T11:14:42Z

I tend to agree. I will look into writing a wrapper anyway, which will at least enable us to make use of the remaining checks.

timokau · 2020-09-18T15:18:25Z

Okay, I have done that. Just for rankers for now, I'll fix the checks there first (of course they don't all pass). I already added some little fixes to this PR. The checks are very slow though, so we'll probably not run them with the regular testsuite.

The "default constructible" (i.e. no required __init__ parameters) property is the most basic property of a scikit-learn estimator. It is a prerequisite for all other estimator checks.

Required by the scikit-learn estimator API for easier fit-predict chaining.

This reverts commit 2d35098.

timokau · 2020-10-01T14:12:36Z

Now that we use poetry & nix, I have included the environment with the patched scikit-learn that is necessary to run these tests. A simple nix-shell will load it (though it will require some build time). It includes a backport of scikit-learn/scikit-learn#17936.

timokau mentioned this pull request Apr 30, 2020

Adhere to scikit-learn estimator interface #94

Closed

6 tasks

timokau changed the title ~~[WIP] Test FETADiscreteChoice estimator interface~~ [WIP] Make our estimators compatible with scikit-learn May 2, 2020

timokau force-pushed the sklearn-compatibility branch from dc1ee9f to 0377ca7 Compare May 2, 2020 15:23

timokau force-pushed the sklearn-compatibility branch from 0377ca7 to 9450337 Compare May 12, 2020 14:24

timokau mentioned this pull request May 12, 2020

Delay random state validation #117

Merged

7 tasks

timokau mentioned this pull request May 14, 2020

Determine data dimensions lazily on fit instead on init #118

Merged

7 tasks

timokau force-pushed the sklearn-compatibility branch from 9450337 to 031dbc0 Compare May 22, 2020 18:09

timokau marked this pull request as draft May 22, 2020 18:10

timokau mentioned this pull request May 22, 2020

Require uninitialized optimizers for our learners #119

Merged

7 tasks

timokau force-pushed the sklearn-compatibility branch from 031dbc0 to 21bf823 Compare May 26, 2020 13:37

kiudee added this to the 1.3 milestone Jun 5, 2020

timokau mentioned this pull request Jun 10, 2020

Do not override the optimizer's default parameters #142

Merged

7 tasks

timokau force-pushed the sklearn-compatibility branch 2 times, most recently from 45267a0 to 352cba0 Compare June 27, 2020 17:25

timokau mentioned this pull request Jul 1, 2020

Misc fixes for default-constructibility of our learners #144

Merged

7 tasks

timokau force-pushed the sklearn-compatibility branch from 352cba0 to e7e5b0c Compare July 1, 2020 17:21

timokau force-pushed the sklearn-compatibility branch from e7e5b0c to ceeca54 Compare July 16, 2020 17:26

timokau mentioned this pull request Aug 1, 2020

Remove threshold instances #149

Merged

7 tasks

timokau mentioned this pull request Aug 21, 2020

Remove clear memory #150

Merged

7 tasks

timokau force-pushed the sklearn-compatibility branch from d214673 to 20a757e Compare August 23, 2020 15:20

timokau mentioned this pull request Sep 16, 2020

Move stateful initialization to a pre_fit function #159

Merged

7 tasks

timokau force-pushed the sklearn-compatibility branch from 4c637b5 to b43cb3e Compare September 17, 2020 12:20

timokau force-pushed the sklearn-compatibility branch from 0f4ae0d to 9e0f3f4 Compare September 23, 2020 13:27

This was referenced Sep 24, 2020

FETA subsampling not working #160

Open

Misc estimator fixes #161

Merged

timokau added 14 commits September 30, 2020 11:09

Add a test for basic sklearn API conformance

4d9d482

The "default constructible" (i.e. no required __init__ parameters) property is the most basic property of a scikit-learn estimator. It is a prerequisite for all other estimator checks.

Add check_no_attributes_set_in_init check

8d77e80

Move to a blacklist model for checks

8d941f9

wip! wrapper

eba1599

wip! wrapper

0cba39d

Fix random state handling in the baseline

ecb4f99

wip! wrapper

ef9be34

Mark failing checks

0a3794a

Fixup!

fd4aae0

Always return self from fit

2a4f282

Required by the scikit-learn estimator API for easier fit-predict chaining.

Fixup!

ccea422

Fixup! self from fit

2f80d64

Revert "Fixup! self from fit"

978800a

This reverts commit 2d35098.

Mark check_fit2d_1feature check as passing

bd5c46f

timokau force-pushed the sklearn-compatibility branch from 9e0f3f4 to 4901119 Compare September 30, 2020 12:47

Include the necessary scikit-learn patch

0c7ca20

timokau force-pushed the sklearn-compatibility branch from 4901119 to 0c7ca20 Compare October 1, 2020 14:10

timokau mentioned this pull request Oct 2, 2020

Migrate away from tf1 #125

Closed

timokau mentioned this pull request Oct 31, 2020

PyTorch migration: Remove tensorflow components, add FATE estimators #164

Merged

7 tasks

[WIP] Make our estimators compatible with scikit-learn #116

Are you sure you want to change the base?

[WIP] Make our estimators compatible with scikit-learn #116

Uh oh!

Conversation

timokau commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Does this close/impact existing issues?

Types of changes

Checklist:

Uh oh!

timokau commented Apr 30, 2020

Uh oh!

codecov bot commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

timokau commented May 2, 2020

Uh oh!

timokau commented May 12, 2020

Uh oh!

timokau commented May 12, 2020

Uh oh!

timokau commented May 22, 2020

Uh oh!

timokau commented May 26, 2020

Uh oh!

timokau commented Jun 27, 2020

Uh oh!

timokau commented Jul 16, 2020

Uh oh!

timokau commented Aug 7, 2020

Uh oh!

timokau commented Sep 17, 2020

Uh oh!

timokau commented Sep 17, 2020

Uh oh!

kiudee commented Sep 17, 2020

Uh oh!

timokau commented Sep 18, 2020

Uh oh!

timokau commented Sep 18, 2020

Uh oh!

timokau commented Oct 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timokau commented Apr 30, 2020 •

edited

Loading

codecov bot commented Apr 30, 2020 •

edited

Loading