Skip to content

Conversation

@stephen-hoover
Copy link

CivisML v2.2 will add the ability for users to subset model predictions, as a way to save time and space. Add this parameter to ModelPipeline.predict.

CivisML v2.2 will add the ability for users to subset model predictions, as a way to save time and space. Add this parameter to `ModelPipeline.predict`.
@stephen-hoover stephen-hoover force-pushed the paro-636-skip-output-cols branch from 482fc9e to b31f364 Compare March 7, 2018 18:38
@stephen-hoover stephen-hoover changed the title ENH Add "targets_to_predict" param to ModelPipeline.predict ENH Add "dvs_to_predict" param to ModelPipeline.predict Mar 7, 2018
@stephen-hoover stephen-hoover requested a review from elsander March 7, 2018 19:35
If this is a multi-output model, you may list a subset of
dependent variables for which you wish to generate predictions.
This list must be a subset of the original `dependent_variable`
input. Ignoring some of the model's outputs will let predictions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this sentence, what about something like "Scores for all outputs will always be calculated, but only the subset will be returned, allowing predictions to complete faster and use less disk space". I think it's good to make this explicit so that users know that asking for a subset won't affect their scores.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that we do currently let the model compute everything, then drop columns from the output array before writing to disk. But that seems like an implementation detail. If we were to somehow modify the models such that they only computed the requested labels, it would still look the same to the user. How about

The scores for the returned subset will be identical to the scores which those outputs would have had if all outputs were written, but ignoring some of the model's outputs will let predictions complete faster and use less disk space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that is more of an implementation detail. I like your revision.

if dvs_to_predict:
if isinstance(dvs_to_predict, six.string_types):
dvs_to_predict = [dvs_to_predict]
if self.predict_template_id > 10600:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nitpicky, but it would be clearer if this were 10583 instead of 10600, to match other parts of the code that reference the specific template id that is the cutoff for that argument.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Changed.

@elsander elsander assigned stephen-hoover and unassigned elsander Mar 7, 2018
Copy link
Contributor

@elsander elsander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@elsander elsander assigned stephen-hoover and unassigned elsander Mar 7, 2018
@stephen-hoover stephen-hoover merged commit 664f6b3 into civisanalytics:master Mar 7, 2018
@stephen-hoover stephen-hoover deleted the paro-636-skip-output-cols branch March 7, 2018 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants