Skip to content

Rename properties of ITrainerEx (TrainerInfo) #543

Closed
@TomFinley

Description

@TomFinley

ITrainerEx (or its functional successor TrainerInfo from #522) contains the following properties.

// REVIEW: Ideally trainers should be able to communicate
// something about the type of data they are capable of being trained
// on, e.g., what ColumnKinds they want, how many of each, of what type,
// etc. This interface seems like the most natural conduit for that sort
// of extra information.
// REVIEW: Can we please have consistent naming here?
// 'Need' vs. 'Want' looks arbitrary to me, and it's grammatically more correct to
// be 'Needs' / 'Wants' anyway.
/// <summary>
/// Whether the trainer needs to see data in normalized form.
/// </summary>
bool NeedNormalization { get; }
/// <summary>
/// Whether the trainer needs calibration to produce probabilities.
/// </summary>
bool NeedCalibration { get; }
/// <summary>
/// Whether this trainer could benefit from a cached view of the data.
/// </summary>
bool WantCaching { get; }

As the comment suggests, we ought to be consistent in naming. There are several things we might consider doing here.

The first thing we might consider is getting rid of NeedCalibration specifically. It is I believe always true when the predictor returned from this is a binary predictor, but does not itself return probabilities. This is a trait that I believe can be directly derived from the predictor object itself, so we might be able to simplify the code here. (Certainly the code to detect whether a predictor produces probabilities might potentially be somewhat involved, and the situation here may be less simple than I suspect.)

The other thing is reconciling Need and Want. The prefix Need is a bit odd, since certainly you don't need to do any of those things for things to work, it's just a suggestion that it might work better if you do those things.

However rather than just reconciling the prefix, we might consider renaming them altogether. They're very oddly named in the sense that they don't describe a property of the trainer they are attached to, they describe an action we suggest a user of the trainer should do to use them (or in the case of NeedCalibration, an action to take on the result of training). That is, they are prescriptive as opposed to descriptive, which seems undesirable.

So take NeedNormalization... trainers don't just need normalization randomly for no reason, they need normalization because they have parametric assumptions about feature data -- maybe a property could be devised to explain that, with the understanding that if it's true a user may benefit from normalizing features. Similarly, caching tends to be useful if an algorithm could perform many passes over the data (therefore making it better to keep in memory).

However maybe this is a bit too goofy... NeedNormalization is easy for more people to reach the desired action vs. some complicated multi-step process of reasoning ("it has parametric assumptions about features, I think my data is not, the way people tend to fix this is apply normalizers, so I will apply a normalizer" vs. just "ah, needs normalization, I will normalize."). Despite the fact that the name is prescriptive and not descriptive, maybe that does not in itself make it inferior to the alternative?

Not sure about this.

/cc @eerhardt , @ericstj , @Zruty0

Metadata

Metadata

Assignees

Labels

APIIssues pertaining the friendly API

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions