Rename properties of `ITrainerEx` (`TrainerInfo`)

`ITrainerEx` (or its functional successor `TrainerInfo` from #522) contains the following properties.

https://github.com/dotnet/machinelearning/blob/ef169b2c67ef394b65d5bedbebd378913789fd9c/src/Microsoft.ML.Core/Prediction/ITrainer.cs#L35-L58

As the comment suggests, we ought to be consistent in naming. There are several things we might consider doing here.

The first thing we might consider is getting rid of `NeedCalibration` specifically. It is I believe always true when the predictor returned from this is a binary predictor, but does not itself return probabilities. This is a trait that I believe can be directly derived from the predictor object itself, so we might be able to simplify the code here. (Certainly the code to detect whether a predictor produces probabilities might potentially be somewhat involved, and the situation here may be less simple than I suspect.)

The other thing is reconciling `Need` and `Want`. The prefix `Need` is a bit odd, since certainly you don't *need* to do any of those things for things to work, it's just a suggestion that it might work better *if* you do those things.

However rather than just reconciling the prefix, we might consider renaming them altogether. They're very oddly named in the sense that they don't describe a property of the trainer they are attached to, they describe an action we suggest a user of the trainer should do to use them (or in the case of `NeedCalibration`, an action to take on the result of training). That is, they are *prescriptive* as opposed to *descriptive*, which seems undesirable.

So take `NeedNormalization`... trainers don't just need normalization randomly for no reason, they need normalization because they have parametric assumptions about feature data -- maybe a property could be devised to explain that, with the understanding that if it's true a user may benefit from normalizing features. Similarly, caching tends to be useful if an algorithm could perform many passes over the data (therefore making it better to keep in memory).

However maybe this is a bit too goofy... `NeedNormalization` is easy for more people to reach the desired action vs. some complicated multi-step process of reasoning ("it has parametric assumptions about features, I think my data is not, the way people tend to fix this is apply normalizers, so I will apply a normalizer" vs. just "ah, needs normalization, I will normalize."). Despite the fact that the name is prescriptive and not descriptive, maybe that does not in itself make it inferior to the alternative?

Not sure about this.

/cc @eerhardt , @ericstj , @Zruty0 

	// REVIEW: Ideally trainers should be able to communicate
	// something about the type of data they are capable of being trained
	// on, e.g., what ColumnKinds they want, how many of each, of what type,
	// etc. This interface seems like the most natural conduit for that sort
	// of extra information.

	// REVIEW: Can we please have consistent naming here?
	// 'Need' vs. 'Want' looks arbitrary to me, and it's grammatically more correct to
	// be 'Needs' / 'Wants' anyway.

	/// <summary>
	/// Whether the trainer needs to see data in normalized form.
	/// </summary>
	bool NeedNormalization { get; }

	/// <summary>
	/// Whether the trainer needs calibration to produce probabilities.
	/// </summary>
	bool NeedCalibration { get; }

	/// <summary>
	/// Whether this trainer could benefit from a cached view of the data.
	/// </summary>
	bool WantCaching { get; }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rename properties of `ITrainerEx` (`TrainerInfo`) #543

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rename properties of ITrainerEx (TrainerInfo) #543

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Rename properties of `ITrainerEx` (`TrainerInfo`) #543