Closed
Description
Current two tutorials in the docs use different columns to get a predicted value out of the pipeline into an instance of the user-defined prediction type:
- Regression taxi fare tutorial uses the Score column
- Binary classification sentiment analysis tutorial uses the PredictedLabel column
How does one know which column to use to populate instances of the prediction type? Especially given that, in case of the (binary) classification solution, the Score column is also available (I guess, then it contains the probabilities of being in a certain class).
As for the trainer inputs, rules are more or less clear:
- Use the Label column for labels (or specify another column name through the
LabelColumn
property) - Use the Features column for features (or specify another column name through the
FeatureColumn
property)
Can the setup of the predictor output be done in similar way:
- Use the column with the same name across all the predictors for the predictor output. I guess that might require to extend regression
IDataView
with the PredictedLabel column that would be a copy of the Score column. - Be able to setup the name of the output column. (That seems the PredictedLabelColumnOriginalValueConverter can be used for that; or I'm wrong and that class is intended for use in tandem with the Dictionarizer?)
By the way, the mere explanation of the Score and PredictedLabel columns here would be appreciated as well. Then, at least, I'll update the docs to make story clearer.