Description
ITrainerEx
(or its functional successor TrainerInfo
from #522) contains the following properties.
machinelearning/src/Microsoft.ML.Core/Prediction/ITrainer.cs
Lines 35 to 58 in ef169b2
As the comment suggests, we ought to be consistent in naming. There are several things we might consider doing here.
The first thing we might consider is getting rid of NeedCalibration
specifically. It is I believe always true when the predictor returned from this is a binary predictor, but does not itself return probabilities. This is a trait that I believe can be directly derived from the predictor object itself, so we might be able to simplify the code here. (Certainly the code to detect whether a predictor produces probabilities might potentially be somewhat involved, and the situation here may be less simple than I suspect.)
The other thing is reconciling Need
and Want
. The prefix Need
is a bit odd, since certainly you don't need to do any of those things for things to work, it's just a suggestion that it might work better if you do those things.
However rather than just reconciling the prefix, we might consider renaming them altogether. They're very oddly named in the sense that they don't describe a property of the trainer they are attached to, they describe an action we suggest a user of the trainer should do to use them (or in the case of NeedCalibration
, an action to take on the result of training). That is, they are prescriptive as opposed to descriptive, which seems undesirable.
So take NeedNormalization
... trainers don't just need normalization randomly for no reason, they need normalization because they have parametric assumptions about feature data -- maybe a property could be devised to explain that, with the understanding that if it's true a user may benefit from normalizing features. Similarly, caching tends to be useful if an algorithm could perform many passes over the data (therefore making it better to keep in memory).
However maybe this is a bit too goofy... NeedNormalization
is easy for more people to reach the desired action vs. some complicated multi-step process of reasoning ("it has parametric assumptions about features, I think my data is not, the way people tend to fix this is apply normalizers, so I will apply a normalizer" vs. just "ah, needs normalization, I will normalize."). Despite the fact that the name is prescriptive and not descriptive, maybe that does not in itself make it inferior to the alternative?
Not sure about this.