-
Notifications
You must be signed in to change notification settings - Fork 24
ENH Add "dvs_to_predict" param to ModelPipeline.predict #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add "dvs_to_predict" param to ModelPipeline.predict #241
Conversation
CivisML v2.2 will add the ability for users to subset model predictions, as a way to save time and space. Add this parameter to `ModelPipeline.predict`.
482fc9e to
b31f364
Compare
civis/ml/_model.py
Outdated
| If this is a multi-output model, you may list a subset of | ||
| dependent variables for which you wish to generate predictions. | ||
| This list must be a subset of the original `dependent_variable` | ||
| input. Ignoring some of the model's outputs will let predictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this sentence, what about something like "Scores for all outputs will always be calculated, but only the subset will be returned, allowing predictions to complete faster and use less disk space". I think it's good to make this explicit so that users know that asking for a subset won't affect their scores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that we do currently let the model compute everything, then drop columns from the output array before writing to disk. But that seems like an implementation detail. If we were to somehow modify the models such that they only computed the requested labels, it would still look the same to the user. How about
The scores for the returned subset will be identical to the scores which those outputs would have had if all outputs were written, but ignoring some of the model's outputs will let predictions complete faster and use less disk space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that is more of an implementation detail. I like your revision.
civis/ml/_model.py
Outdated
| if dvs_to_predict: | ||
| if isinstance(dvs_to_predict, six.string_types): | ||
| dvs_to_predict = [dvs_to_predict] | ||
| if self.predict_template_id > 10600: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nitpicky, but it would be clearer if this were 10583 instead of 10600, to match other parts of the code that reference the specific template id that is the cutoff for that argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Changed.
elsander
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
CivisML v2.2 will add the ability for users to subset model predictions, as a way to save time and space. Add this parameter to
ModelPipeline.predict.