API: Binary Classification Training Context

There seems to be something appealing about a convenience object whose purpose is to help "guide" people on the path to a successful experiment. So for example, someone might have a pipeline where they featurize, then learn, then evaluate on a test set. Each of these is of course naturally implemented in separate classes, which is good. But it also means that the ingredients necessary to compose a successful experiment are naturally spread hither and yon.

You might imagine that in addition to the components, there might be some sort of "task context" object, like for example, a `BinaryClassifierContext`. This might have common facilities: for example, a common way to "browse" binary classifier trainers, and to evaluate binary classification outputs.

There is something appealing about doing this:

```csharp
var data = ...
var ctx = new BinaryClassificationContext();
var prediction = ctx.Trainers.FastTree(data, ...);
var metrics = ctx.Evaluate(prediction, ...);
```

vs. this

```csharp
var data = ...
var prediction = new FastTreeBinaryClassifierEstimator(data, ...);
var eval = new BinaryClasifierEvaluator(...);
var metrics = eval.Evaluate(prediction, ...)
```

The latter case is certainly no less powerful, but if I imagine someone tooling around in intellisense, the sheer number of things you'll get by including the key namespaces and saying `new` is absolutely dizzying, vs. this context which can be very, very focused.

In the case of static pipelines the story is a little bit better, "we provide extension methods on `Scalar<bool>`", which is OK *if you know that*, but if you don't happen to know that, I see no reasonable way you could discover that without reading documentation and samples. (Of course for that matter I see ). But requiring knowledge at the level of, "if you want to do something related to binary classifiers, please say `new BinaryClassifierContext`" or something, that seems kind of reasonable to me.

This hypothetical `Context` object would contain at least two things: the first is a property. (It must be an actual instance because the only way external assemblies could "add" their learners to it would be via extension methods.) The second is one or more `Evaluate` methods to produce metrics.

These "objects" do have state in the sense that they must have an `IHostEnvironment`, but aside from this are more or less like "namespaces," with the important difference possibly that you can't have a top level function as a namespace. (Though perhaps we don't care about doing functions.) There was some thought that if we also defined pipelines through them we could avoid having environments in the dynamic pipelines altogether (as we already do for static pipelines), but how this would be accomplished is not clear to me.

Also because the only reasonable way things can add themselves is via an extension method, this `Trainers` object would have to be an actual instance... now then, it needn't actually be instantiable -- one can call extension methods on the `null` of an object as well as anything so long as we don't want to get any information out of it -- but that is a little awkward. If we could just put extension methods on, say, a static class or something that would be nice, but we can't.

# Work Item

The first thing I will do is create a binary classification training context object, as an exploration of the idea. If we like the idea, we can extend it to the other tasks as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

API: Binary Classification Training Context #949

Work Item

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

API: Binary Classification Training Context #949

Description

Work Item

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions