Skip to content

API: Binary Classification Training Context #949

Closed
@TomFinley

Description

@TomFinley

There seems to be something appealing about a convenience object whose purpose is to help "guide" people on the path to a successful experiment. So for example, someone might have a pipeline where they featurize, then learn, then evaluate on a test set. Each of these is of course naturally implemented in separate classes, which is good. But it also means that the ingredients necessary to compose a successful experiment are naturally spread hither and yon.

You might imagine that in addition to the components, there might be some sort of "task context" object, like for example, a BinaryClassifierContext. This might have common facilities: for example, a common way to "browse" binary classifier trainers, and to evaluate binary classification outputs.

There is something appealing about doing this:

var data = ...
var ctx = new BinaryClassificationContext();
var prediction = ctx.Trainers.FastTree(data, ...);
var metrics = ctx.Evaluate(prediction, ...);

vs. this

var data = ...
var prediction = new FastTreeBinaryClassifierEstimator(data, ...);
var eval = new BinaryClasifierEvaluator(...);
var metrics = eval.Evaluate(prediction, ...)

The latter case is certainly no less powerful, but if I imagine someone tooling around in intellisense, the sheer number of things you'll get by including the key namespaces and saying new is absolutely dizzying, vs. this context which can be very, very focused.

In the case of static pipelines the story is a little bit better, "we provide extension methods on Scalar<bool>", which is OK if you know that, but if you don't happen to know that, I see no reasonable way you could discover that without reading documentation and samples. (Of course for that matter I see ). But requiring knowledge at the level of, "if you want to do something related to binary classifiers, please say new BinaryClassifierContext" or something, that seems kind of reasonable to me.

This hypothetical Context object would contain at least two things: the first is a property. (It must be an actual instance because the only way external assemblies could "add" their learners to it would be via extension methods.) The second is one or more Evaluate methods to produce metrics.

These "objects" do have state in the sense that they must have an IHostEnvironment, but aside from this are more or less like "namespaces," with the important difference possibly that you can't have a top level function as a namespace. (Though perhaps we don't care about doing functions.) There was some thought that if we also defined pipelines through them we could avoid having environments in the dynamic pipelines altogether (as we already do for static pipelines), but how this would be accomplished is not clear to me.

Also because the only reasonable way things can add themselves is via an extension method, this Trainers object would have to be an actual instance... now then, it needn't actually be instantiable -- one can call extension methods on the null of an object as well as anything so long as we don't want to get any information out of it -- but that is a little awkward. If we could just put extension methods on, say, a static class or something that would be nice, but we can't.

Work Item

The first thing I will do is create a binary classification training context object, as an exploration of the idea. If we like the idea, we can extend it to the other tasks as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions