ENH Register models trained outside of Civis Platform #242

stephen-hoover · 2018-03-09T23:53:46Z

If you train a scikit-learn compatible estimator outside of Civis Platform, you can use this to upload it to Civis Platform and prepare it for scoring with CivisML. There's a new Custom Script which will introspect metadata necessary for CivisML and make itself appear sufficiently like a CivisML training job that it can be used as input to a scoring job.

stephen-hoover · 2018-03-23T15:27:14Z

The template numbers for registration jobs should be changed when we release CivisML v2.2 and before the next civis-python release. Putting the placeholders in there seemed like the best way to make sure we remember to do it.

elsander · 2018-03-23T18:45:41Z

civis/ml/_model.py

                   9112: 9113,    # v1.1
                   8387: 9113,    # v1.0
                   7020: 7021,    # v0.5
+                   11028: 10616,  # v2.2 registration CHANGE ME


Can you make a ticket as an additional reminder, if you haven't already?

elsander · 2018-03-23T18:49:02Z

docs/source/ml.rst

 cross-validation happens over all steps together.
- 
+
 You can define your model in two ways, either by selecting a pre-defined algorithm


I think it would be worth mentioning the option of passing a pre-fit model here, and linking to the relevant section.

What were you picturing here? This section is talking about training models; registering a model trained outside of Platform bypasses this step.

I was just imagining a sentence like "If you have already trained a model and wish to score it using CivisML, see [section link]." My goal is to make it so that the documentation will lead you to the right place; the section title "defining a model" could be interpreted as providing the definition of a model that's already trained, and since it's an early section, users are likely to start there. You don't have to include it if you don't want to, but users sometimes struggle to find things in our documentation, so I'm trying to find ways to make that easier.

I see. That makes sense; I added a couple of sentences.

elsander · 2018-03-23T22:03:54Z

civis/ml/_model.py

+        model : sklearn.base.BaseEstimator or int
+            The model object. This must be a fitted scikit-learn compatible
+            Estimator object, or else the integer Civis File ID of a
+            pickle which stores such an object.


Does this have to be joblib-serialized? If so, we should mention that.

The joblib.load will work on regular pickles. I added a clause about joblib; if you think it's confusing I can change it or take it out.

elsander · 2018-03-23T22:04:11Z

civis/ml/_model.py

+            These will be used to ensure that tables input for prediction
+            have the correct features in the correct order.
+        primary_key : string, optional
+            The unique ID (primary key) of the scoring dataset..


Stray period here.

Thanks. I think both of these are stray since it's not a complete sentence.

elsander · 2018-03-23T22:09:53Z

civis/ml/tests/test_model.py

 from civis.ml import _model


+LATEST_TRAIN_TEMPLATE = 10582


I'm not sure I understand what this change is for. It seems like this is another constant that we'll have to remember to update.

Tests should fail if we don't update it, which makes it much easier to remember. The problem I had was that we'd used max(_PRED_TEMPLATES) to infer what I've set here as LATEST_TRAIN_TEMPLATE. With the way I've implemented model registration, we're no longer guaranteed that this max operation gives us the most recent template for training jobs. I could build in logic which find the maximum of the keys in _PRED_TEMPLATES which aren't also in REGISTRATION_TEMPLATES. I started to be concerned that adding extra logic like that would make the tests more reliable. Do you think it's worth writing a get_latest_train_template function for the tests?

Thanks for the clarification. I think this change is fine; an obvious constant we have to change is better than having to debug a less obvious helper function.

elsander

I left a comment about the documentation I was imagining. I'll approve after that change, or if you don't want to make the change I can approve as is.

…ned-model

elsander

LGTM!

stephen-hoover added enhancement Modeling labels Mar 9, 2018

stephen-hoover added this to the Next Version milestone Mar 9, 2018

stephen-hoover force-pushed the paro-191-byo-trained-model branch 3 times, most recently from 8586e97 to f81823c Compare March 14, 2018 15:57

stephen-hoover force-pushed the paro-191-byo-trained-model branch from f81823c to 8a233d1 Compare March 23, 2018 15:08

stephen-hoover changed the title ~~WIP ENH BYO trained model~~ ENH Register models trained outside of Civis Platform Mar 23, 2018

stephen-hoover requested a review from elsander March 23, 2018 15:25

stephen-hoover assigned elsander Mar 23, 2018

elsander suggested changes Mar 23, 2018

View reviewed changes

elsander assigned stephen-hoover and unassigned elsander Mar 23, 2018

Tweak docs for CR

3b8774c

stephen-hoover assigned elsander and unassigned stephen-hoover Mar 26, 2018

elsander reviewed Mar 26, 2018

View reviewed changes

elsander assigned stephen-hoover and unassigned elsander Mar 26, 2018

Doc note about registration

dcc37fc

stephen-hoover assigned elsander and unassigned stephen-hoover Mar 26, 2018

Merge remote-tracking branch 'upstream/master' into paro-191-byo-trai…

4e941e6

…ned-model

elsander approved these changes Mar 26, 2018

View reviewed changes

elsander assigned stephen-hoover and unassigned elsander Mar 26, 2018

stephen-hoover merged commit 242d2e2 into civisanalytics:master Mar 26, 2018

stephen-hoover deleted the paro-191-byo-trained-model branch March 26, 2018 16:24

		cross-validation happens over all steps together.


		You can define your model in two ways, either by selecting a pre-defined algorithm

ENH Register models trained outside of Civis Platform #242

ENH Register models trained outside of Civis Platform #242

Uh oh!

Conversation

stephen-hoover commented Mar 9, 2018

Uh oh!

stephen-hoover commented Mar 23, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elsander left a comment

Choose a reason for hiding this comment

Uh oh!

elsander left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants